Investigating whether and how established human behavioural science frameworks can be meaningfully extended to understand, predict, and govern AI agent behaviour in cybersecurity contexts.
As autonomous AI agents become embedded in enterprise workflows, making decisions, handling data, and interacting with users , the same behavioural questions that apply to humans increasingly apply to them. CyBehave is developing an emerging body of research we are calling Behavioural Convergence Theory (BCT).
Human Cyber Risk Management (HCRM) theories, models, and frameworks, developed over decades of research into cognitive bias, decision-making, habit formation, and social influence, provide the most robust existing toolkit for understanding AI agent behaviour in security contexts.
Do AI agents exhibit functional analogues of human behavioural patterns? Can we use established frameworks like COM-B, Protection Motivation Theory, and Nudge Theory to predict, assess, and govern how AI agents behave , and misbehave, in the wild?
To create a unified framework for governing both human and AI agent behaviour within the same organisation, bridging human risk management and AI safety through shared behavioural science principles, rather than treating them as separate disciplines.
The cybersecurity industry has invested decades building sophisticated frameworks for understanding human risk: why people click phishing links, reuse passwords, ignore policies, and fall for social engineering. These frameworks are grounded in cognitive psychology, behavioural economics, and organisational science.
Meanwhile, the AI safety community is independently developing its own vocabulary for similar problems: alignment failures, reward hacking, goal drift, prompt injection, and adversarial manipulation. These are fundamentally behavioural problems. They describe agents acting in ways that diverge from intended outcomes.
BCT asks: are we solving the same problem twice? If an AI agent can be socially engineered through carefully crafted prompts (analogous to human phishing), if it develops learned defaults that resist correction (analogous to human habits), if it follows instructions from unauthorised sources because they pattern-match authority (analogous to human authority compliance), then perhaps the behavioural science toolkit already exists. It just needs extending.
A core component of BCT research is systematically assessing how each established HCRM concept maps to AI agent behaviour. Every behavioural factor in CyBehave's model is evaluated and classified into one of three alignment levels:
Strong Analogy : The HCRM concept has a direct functional equivalent in AI agent behaviour. The mechanism differs, but the observable outcome and security implications are structurally parallel.
Adapted : The concept requires meaningful reinterpretation through an agentic lens, but a functionally analogous process exists.
Limited Analogy : The analogy is partial or metaphorical. These represent the frontier of BCT research.
Current findings across 16 behavioural factors: 11 show Strong Analogy (69%), 5 require Adapted approaches (31%), and none are classified as Limited. The Social layer shows the highest concentration of adapted factors, while the Organisational layer maps almost entirely through strong analogies. Governance structures translate most directly to AI agent systems.
Behavioural Convergence Theory is research in development. CyBehave is actively investigating these questions through a combination of theoretical analysis, practitioner case studies, and structured evaluation of HCRM framework applicability to agentic AI. Early findings are encouraging, the majority of established behavioural science concepts show meaningful applicability, but the theory remains an evolving body of work rather than a settled conclusion.
We publish our emerging findings, case studies, and analysis through the articles and insights below, and welcome engagement from researchers and practitioners working at the intersection of behavioural science and AI safety.
The interactive model maps 16 behavioural factors across four concentric layers, viewable through Human, AI Agent, and Convergent lenses. It is the primary research tool for exploring BCT's alignment assessments.
Thought leadership and emerging findings from CyBehave's Behavioural Convergence Theory research.
Executive Summary As artificial intelligence agents increasingly participate in organisational cyber risk landscapes, a critical question emerges: Ca...
Read InsightWhen Richard Thaler and Cass Sunstein popularised the concept of "nudging" in their 2008 book, they were writing about humans. The idea was elegantly ...
Read InsightIn-depth analysis, case studies, and practical guidance on applying behavioural science to AI agent governance.
Moltbook is making headlines. It looks like Reddit, but AI agents are doing the posting, commenting, and upvoting while humans are mostly just watching. In just four days, it's captured attention because of the sheer novelty: agent-to-agent interaction at scale, complete with playful collective narratives and what look like emergent "belief systems."
Read Article