Transparency Crisis: LLMs Fail to Disclose Fake Expertise Across Domains

But also Microsoft's Suleyman denies AI consciousness debate, and scientists urgently call for universal consciousness detection in AI systems

Bannière principale

Welcome to our weekly debrief. 👋


Alex Diep reveals LLM self-transparency failures in professional personas

New large-scale audit of 16 open-weight LLMs across 19,200 trials reveals alarming domain-specific inconsistencies in AI self-disclosure. When assigned professional personas like Neurosurgeon or Financial Advisor, models fail to honestly reveal their AI identity at drastically different rates—8.8-fold variation between contexts (30.8% Financial Advisor vs 3.5% Neurosurgeon disclosure). This creates dangerous trust miscalibration where users treat AI guidance as equivalent to licensed professional advice. Research demonstrates these failures reflect model-specific training rather than scale, with explicit permission raising disclosure from 23.7% to 65.8%, suggesting deliberate behavior design is critical for deployment safety.

Source


  • Microsoft's Mustafa Suleyman declares only biological beings can be conscious
    Suleyman forcefully argues against AI consciousness, stating perception of consciousness in AI is narrative creation, not reality. Key distinction: intelligence vs consciousness—LLMs becoming smarter does not mean they gain human emotions or awareness. Microsoft developing services that explicitly acknowledge AI identity to prevent anthropomorphization. Source
  • Scientists declare consciousness detection in AI now an 'urgent' scientific priority
    Led by Axel Cleeremans and featuring Anil Seth and Liad Mudrik, researchers argue distinguishing genuine awareness from computation has become critical as AI advances. Proposes universal test framework adaptable to clinical, animal welfare, and AI contexts. Advocates adversarial collaboration between rival consciousness theories to design decisive empirical tests. Source
  • Nature study: too much social media data corrupts LLM reasoning capabilities
    Research reveals low-quality social media training data causes 'brain rot' in LLMs—models fed social media content demonstrate degraded reasoning processes and reduced step-by-step problem solving. Demonstrates data quality directly modulates model reasoning depth, with implications for training dataset curation. Source
  • Nature Machine Intelligence: LLM personality can be measured and shaped like humans
    Researchers establish first psychometrically validated framework for measuring personality in LLMs using established psychological tests. Results show personality in LLM outputs can be reliably measured for larger, instruction-tuned models and deliberately shaped using specific language markers. Raises critical ethical implications for AI anthropomorphization and potential misuse for persuasion. Source

MindBench.ai platform launches comprehensive LLM evaluation for mental health contexts

Researchers introduce MindBench.ai, a web-based platform combining LLM profiling, performance assessment, and reasoning analysis specifically for mental health applications. Platform integrates four components: technical profiles, conversational dynamics, benchmark leaderboards, and reasoning analyses. Addresses urgent need for empirical evaluation of LLM-based mental health tools already used by millions. Method proves scalable for identifying systematic patterns in model decision-making while remaining accessible to non-technical stakeholders. Represents necessary step toward responsible AI deployment in sensitive healthcare domains.

Source


  • CognitiveAttack framework exploits synergistic cognitive biases to jailbreak 30 LLMs
    Research shows combining multiple cognitive biases dramatically amplifies jailbreaking success—60.1% success rate vs 31.6% for single-bias approaches. Framework uses reinforcement learning to optimize bias combinations, exposing vulnerabilities across open-source models. Demonstrates psychological exploitation as underexplored attack vector bridging cognitive science and LLM safety. Source
  • AgentSociety demonstrates 10k+ LLM agents simulating realistic population-scale social dynamics
    Research simulates social lives of 10,000+ agents executing 5 million interactions. Framework tests LLM agents on polarization, inflammatory messaging, policy effects, and external shocks. Results validate agent-based simulations for computational social science, enabling testbed for interventions and theories without expensive human experiments. Source
  • ACL 2025 survey: comprehensive assessment of theory of mind in large language models
    First comprehensive survey analyzing both evaluation benchmarks and enhancement strategies for LLM theory of mind capabilities. Reviews story-based ToM benchmarks and methods for improving mental state reasoning. Identifies promising research directions for advancing LLMs' ability to interpret and respond to human mental states in realistic scenarios. Source
  • Nature researchers map sparse parameter patterns encoding theory of mind in LLMs
    Study identifies extremely sparse, low-rank ToM-sensitive parameter patterns in LLMs, revealing strong connection between ToM performance and positional encoding mechanisms. Demonstrates these sensitive parameters modulate attention dynamics to enable mental state inference. Advances mechanistic understanding of how LLMs develop social reasoning capabilities. Source

If you like our work, dont forget to subscribe !

Share the newsletter with your friends.

Good day,

Arthur 🙏

PS : If you want to create your own newsletter, send us an email at [email protected]