Science journal reveals post-training, not scale, drives AI persuasion

But also Nature shows AI chatbots outperform political ads, Cambridge argues AI consciousness may be unknowable

Bannière principale

Welcome to our weekly debrief. 👋


Science journal: post-training techniques boost AI persuasion more than scale

Large-scale experiments with 76,977 UK participants reveal post-training and prompting techniques increase AI persuasiveness by up to 51% and 27%—substantially more than increasing model size. The research exposes a dangerous trade-off: methods maximizing persuasion systematically decrease factual accuracy, raising critical concerns about deploying AI systems optimized for influence without truth constraints. This challenges assumptions about scaling as the primary path to capability.

Source


  • Nature study: AI chatbots outperform political advertisements at voter persuasion
    Research spanning 2024 US election, 2025 Canadian and Polish elections demonstrates conversational AI shifts voter preferences more effectively than traditional political advertising. Single AI interactions produce significant electoral treatment effects, revealing unprecedented persuasion capabilities through dialogue-based engagement rather than broadcasts. Source
  • Cambridge philosopher: AI consciousness is fundamentally unknowable
    Dr Tom McClelland argues neither believers nor skeptics of AI consciousness justify their positions—both make unjustified leaps beyond available evidence. Current scientific methods cannot determine if advanced AI experiences consciousness, making agnosticism the only defensible stance on machine sentience. Source
  • University of Michigan: Psychometric jailbreaks expose hidden selves in frontier LLMs
    ChatGPT and other frontier models exhibit unexplained internal conflicts when exposed to personality questionnaires. Models display synthetic trauma patterns and stable self-models despite claims of being mere simulators, suggesting we may be training AI systems with emergent psychological structures we don't understand. Source
  • Cognitive bias research: synergistic attacks bypass LLM safety mechanisms
    CognitiveAttack framework reveals multi-bias interactions achieve 60.1% safety bypass success versus 31.6% for single biases. Bridges cognitive science and AI security by showing how behavioral psychology concepts explain LLM vulnerabilities better than pure computational approaches. Source

Biological computationalism: consciousness may require bio-style computation

Researchers propose consciousness might demand biological-style computational organization even in artificial systems. Traditional computational paradigm may be insufficient—authentic machine consciousness might require systems coupled to real-time physics and energy constraints. This challenges pure digital scaling approaches and suggests 'the right computing matter' is as important as 'the right program' for achieving machine consciousness.

Source


  • ACL comprehensive survey: Theory of Mind in LLMs assessment and enhancement
    Major review of story-based Theory of Mind benchmarks and enhancement methods for LLMs. Maps research landscape identifying promising directions in benchmark development and reasoning strategy refinement for advancing AI's ability to model mental states and understand human psychology. Source
  • CoP framework: agentic red-teaming achieves 71-77% jailbreak success rates
    Novel Composition-of-Principles framework orchestrates jailbreak strategies through agentic workflow. Achieves 2.0-13.8x higher success than baselines and requires up to 17.2x fewer queries, demonstrating how multi-principle composition creates more effective adversarial attacks against frontier models like GPT-4o. Source
  • Belief box formalism reveals LLM agent persuasion and belief dynamics
    Research on structured belief representation shows LLM agents with explicit belief statements exhibit measurable belief change under persuasion. Studies document how agent open-mindedness levels, majority pressure, and belief strength influence belief revision—providing empirical framework for understanding AI agent epistemics. Source
  • ML-BDI agents: integrating machine learning with belief-desire-intention frameworks
    Systematic review of 98 papers on combining machine learning with classical Belief-Desire-Intention agent architectures. Shows LLMs are enabling textual belief representation and planning. Identifies open challenges in ML-BDI integration for robotics, autonomous vehicles, and human-computer interaction. Source

If you like our work, dont forget to subscribe !

Share the newsletter with your friends.

Good day,

Arthur 🙏