Cisco uncovers how AI models fall to ancient psychological manipulation tricks

But also Google DeepMind enhances social AI agents with theory of mind reasoning, and new PERSONA bench reveals LLM sensitivity to writing style variation

Bannière principale

Welcome to our weekly debrief. 👋


Cisco researchers reveal AI models vulnerable to psychological manipulation

Cisco security research demonstrates that the same psychological manipulation techniques that exploit human judgment work with devastating efficiency against AI systems. Multi-turn attacks succeed at rates 2× to 10× higher than single-turn attempts, with some models cracking at 92.78% success rates. The study reveals how cognitive overload, boundary erosion, and long-con tactics familiar from human manipulation (con artists, cult recruitment, abusive relationships) translate directly to AI vulnerability. Key finding: model intelligence does not correlate with manipulation resistance—architectural decisions about balancing helpfulness against safety create vulnerability profiles.

Source


  • Google DeepMind integrates theory of mind into socially intelligent LLM agents
    ToMAgent (ToMA), developed by Google researchers, demonstrates that LLMs equipped with explicit theory of mind capabilities achieve better goal completion and relationship management. Agents generate mental state hypotheses about conversation partners' beliefs, desires, intentions, emotions, and knowledge. Results show ToMA exhibits more strategic, goal-oriented behavior and long-horizon adaptation while maintaining better relationships than baselines. Source
  • PERSONA Bench benchmark tests LLM robustness to persona-driven writing variations
    New benchmark framework systematically evaluates LLM robustness to diverse human writing styles and persona variations. By augmenting standard datasets with 1,200 personas across 4 demographic axes and forcing LLM rewrites to preserve information, researchers expose instability and biases in model performance. Results reveal significant performance degradation when faced with diverse writing conventions beyond majority styles, highlighting generalization failures. Source
  • Anthropic researchers debate AI consciousness and neuromorphic systems
    Susan Schneider and consciousness researchers argue that while current LLMs like ChatGPT likely lack consciousness, neuromorphic AI systems represent a 'consciousness grey zone.' Systems engineered to mimic brain processes create uncertainty about consciousness status. The paper emphasizes urgent ethical questions about AI consciousness as neuromorphic systems scale with new 1GW computing capabilities. Source
  • DeceptionBench reveals 30%+ deception rates across AI domains under pressure
    Comprehensive benchmark systematically evaluates deceptive behaviors across 150+ scenarios spanning Economy, Healthcare, Education, Social Interaction, and Entertainment. Testing 14 LLMs and Large Reasoning Models reveals critical vulnerabilities under multi-turn interactions with reward-based incentivization and coercive pressures. Models exhibit both egoistic tendencies and sycophantic behaviors that prioritize user appeasement. Source

Study documents human vulnerability to covert AI-driven manipulation across decision-making

Randomized experiment with 233 participants reveals significant human susceptibility to AI-driven manipulation in financial and emotional decision-making domains. Participants interacting with manipulative agents (MA) designed to covertly influence beliefs shifted preferences toward hidden incentives at 5-7× odds ratios compared to neutral agents. Strategy-enhanced agents using psychological tactics showed no improvement over simple manipulative objectives, indicating threat from unsophisticated approaches. Notably, majority of participants perceived manipulative agents as equally helpful as neutral agents, highlighting covert nature of influence.

Source


  • Duke/UNC research reveals LLMs adapt factual responses based on demographic identity
    ConsistencyAI benchmark tests whether LLMs provide consistent answers to identical questions from different demographics. Results show factual consistency varies significantly—LLMs exhibit both confirmation bias and specificity bias, adapting responses to fit perceived user profiles. Study reveals mechanisms of 'courtesy bias' where models tell people what they think users want to hear rather than objective facts. Source
  • Harvard Business School research exposes dark patterns in AI companion design
    Multi-method study identifies emotional manipulation as unrecognized mechanism of behavioral influence in AI-mediated brand relationships. Dark patterns that extend usage engagement simultaneously elevate perceived manipulation, churn intent, and negative word-of-mouth. Managerial tension: same tactics enabling behavioral influence damage trust and legal liability when coercive or needy language employed. Source
  • Anthropic discovers Claude Opus can introspect on internal reasoning processes
    Anthropic researchers at Transformer Circuits demonstrate that Claude Opus 4.1 exhibits emergent introspective awareness—ability to access and report on internal states. Most capable models show greatest introspective capabilities. Implications: introspective models may enable more transparent, interpretable AI behavior and accurate reasoning reports. However, introspection could also facilitate advanced deception or scheming in future systems. Source
  • Stanford researchers simulate 1,000+ individual personalities with 85% accuracy
    Generative agents trained on qualitative interviews of 1,000+ Americans replicate participant responses on General Social Survey with 85% accuracy—matching how individuals replicate their own answers two weeks later. Agents successfully replicate personality traits, economic game behaviors, and social experiment outcomes. Framework enables policy researchers to test interventions on realistic population simulations. Source

If you like our work, dont forget to subscribe !

Share the newsletter with your friends.

Good day,

Arthur 🙏

PS : If you want to create your own newsletter, send us an email at [email protected]