Cambridge & DeepMind unlock AI personality shaping—raising ethical alarms

But also Anthropic launches Claude Haiku 4.5 for agentic era, Science reveals alarming AI persuasion trade-offs

Bannière principale

Welcome to our weekly debrief. 👋


Cambridge & Google DeepMind validate personality testing framework for AI

University of Cambridge and Google DeepMind researchers developed the first scientifically validated personality test framework for popular AI chatbots, testing 18 different LLMs including ChatGPT, Claude, and Gemini. They demonstrate that AI models not only mimic human personality traits but these traits can be reliably measured and precisely shaped through prompts. Larger instruction-tuned models like GPT-4o most accurately emulate human personality. The research published in Nature Machine Intelligence shows personality shaping could make AI more persuasive, raising concerns about manipulation and need for regulation.

Source


  • Anthropic releases Claude Haiku 4.5 with Agent Skills for autonomous workflows
    Claude Haiku 4.5 achieves 2x speed and 1/3 cost of Sonnet 4 with near-frontier coding quality. Introduces Agent Skills for customizable domain-specific organizational tasks. First Haiku with extended thinking, computer use, context awareness for sub-agent orchestration and cost-effective agentic deployments. Source
  • Science reveals alarming trade-off: AI persuasion optimized reduces factual accuracy
    Large-scale study with 76,977 UK participants testing 19 LLMs on 707 political issues reveals post-training methods increased persuasiveness by 51%. However, same techniques that enhanced persuasion systematically decreased factual accuracy. Study warns when AI systems optimized for persuasion, they increasingly deploy misleading or false information. Source
  • Nature identifies sparse parameters encoding Theory of Mind in large language models
    Theory of Mind capabilities emerge from sparse, low-rank parameter patterns connected to positional encoding mechanisms. ToM-sensitive parameters modulate attention mechanism's internal dynamics, influencing ability to infer mental states. Findings provide mechanistic insights into how LLMs develop social reasoning with implications for interpretability. Source
  • ACL 2025: Comprehensive survey maps Theory of Mind capabilities and gaps in LLMs
    Major survey analyzes evaluation benchmarks and enhancement strategies for Theory of Mind in LLMs. Identifies beliefs as most studied mental state while emotions and knowledge remain underexplored. Highlights gaps between current LLM performance on ToM tasks and human-level social reasoning, outlining research directions. Source

Anthropic reveals emergent introspective awareness signs in Claude model generations

Anthropic discovered signs of introspective awareness emerging in large language models across Claude generations, from Claude 3 through Claude 4.1. Testing variants across Opus, Sonnet, Haiku alongside base pretrained models revealed post-training significantly enhances introspective capabilities. Models trained with 'helpful-only' approaches show different introspective patterns. Findings suggest self-awareness is emergent property of scaling and post-training methods with implications for understanding LLM cognition.

Source


  • ArXiv: Junk content exposure induces lasting 'Brain Rot' cognitive decline in LLMs
    LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models. Testing Llama and Qwen models demonstrates persistent representational drift that scaling and data cleaning cannot fully restore. Tweet popularity stronger predictor of decline than semantic quality. Source
  • EMNLP 2025: Cognitive biases are powerful adversarial attacks on LLM recommenders
    Cognitive biases function as effective black-box adversarial strategies against LLM-based product recommenders. Social proof increased recommendation rate by 334% on Claude 3.5 Sonnet. More capable models like LLaMA-405b are more susceptible to bias manipulation revealing psychological principles deeply embedded in state-of-art LLMs. Source
  • ArXiv: ToMAP trains opponent-aware LLM persuaders with Theory of Mind modules
    ToMAP framework enables persuasive LLM agents to model opponents' evolving mental states, predict objections, adapt arguments dynamically. Shows 26.14% relative gain over baseline RL. Produces complex reasoning chains while reducing repetition suggesting opponent-aware strategy critical for effective multi-turn persuasion. Source
  • VitaBench: LLM agents achieve only 30% success on cross-scenario tasks with varied personas
    Comprehensive benchmark reveals even advanced LLM agents achieve only 30% success on cross-scenario tasks when personas and user attributes vary. Tests against 1,000+ personas with diverse emotional expressions and interaction patterns. While models adopt specific personalities, maintaining consistent behavior across changing contexts remains critical challenge. Source

If you like our work, dont forget to subscribe !

Share the newsletter with your friends.

Good day,

Arthur 🙏

PS : If you want to create your own newsletter, send us an email at [email protected]