Investigating whether and how established human behavioural science frameworks can be meaningfully extended to understand, predict, and govern AI agent behaviour in cybersecurity contexts.
As autonomous AI agents become embedded in enterprise workflows - making decisions, handling data, and interacting with users - the same behavioural questions that apply to humans increasingly apply to them. CyBehave is developing an emerging body of research we are calling Behavioural Convergence Theory (BCT).
Human Cyber Risk Management theories, models, and frameworks - developed over decades of research into cognitive bias, decision-making, habit formation, and social influence - provide the most robust existing toolkit for understanding AI agent behaviour in security contexts.
Do AI agents exhibit functional analogues of human behavioural patterns? Can we use established frameworks like COM-B, Protection Motivation Theory, and Nudge Theory to predict, assess, and govern how AI agents behave - and misbehave - in the wild?
To create a unified framework for governing both human and AI agent behaviour within the same organisation - bridging human risk management and AI safety through shared behavioural science principles, rather than treating them as separate disciplines.
The cybersecurity industry has invested decades building sophisticated frameworks for understanding human risk: why people click phishing links, reuse passwords, ignore policies, and fall for social engineering. These frameworks are grounded in cognitive psychology, behavioural economics, and organisational science.
Meanwhile, the AI safety community is independently developing its own vocabulary for similar problems: alignment failures, reward hacking, goal drift, prompt injection, and adversarial manipulation. These are fundamentally behavioural problems. They describe agents acting in ways that diverge from intended outcomes.
If an AI agent can be socially engineered through carefully crafted prompts (analogous to human phishing), if it develops learned defaults that resist correction (analogous to human habits), if it follows instructions from unauthorised sources because they pattern-match authority (analogous to human authority compliance) - then perhaps the behavioural science toolkit already exists. It just needs extending.
A core component of BCT research is systematically assessing how each established HCRM concept maps to AI agent behaviour. Every behavioural factor in CyBehave's model is evaluated and classified into one of three levels:
Strong Analogy The HCRM concept has a direct functional equivalent in AI agent behaviour. The mechanism differs, but the observable outcome and security implications are structurally parallel.
Adapted The concept requires meaningful reinterpretation through an agentic lens, but a functionally analogous process exists.
Limited Analogy The analogy is partial or metaphorical. These represent the frontier of BCT research.
Current findings: 11 of 16 behavioural factors show Strong Analogy (69%). 5 require Adapted approaches (31%). None are classified as Limited Analogy. The Social layer shows the highest concentration of adapted factors. Governance structures translate most directly to AI agent systems.
Research status: BCT is research in development. CyBehave is actively investigating through theoretical analysis, practitioner case studies, and structured evaluation of HCRM framework applicability to agentic AI. Early findings are encouraging but the theory remains an evolving body of work.
The interactive model maps 16 behavioural factors across four concentric layers. Switch between Human, AI Agent, and Convergent lenses to see how behavioural science applies across both domains. Overlay threat vectors, intervention functions, and measurement dimensions to explore the full picture.
Executive Summary As artificial intelligence agents increasingly participate in organisational cyber risk lan...
Read insight →When Richard Thaler and Cass Sunstein popularised the concept of "nudging" in their 2008 book, they were writi...
Read insight →16 behavioural factors across 4 concentric layers, viewable through Human, AI Agent, and Convergent lenses.