Yale team argues GPT-4o still lacks a true theory of mind
But also Tübingen linguists report emergent social world models in LLMs, Google DeepMind team finds LLM personality traits misalign with humans

Welcome to our weekly debrief. 👋
Yale team argues GPT-4o still lacks a true theory of mind
Yale cognitive scientists put GPT-4o through a far stricter theory-of-mind exam than the usual story puzzles, asking whether it really maintains a coherent model of how beliefs and desires cause actions. They show the model can match human judgments in one setup yet fail on a logically equivalent variant, and that its inferred mental states often contradict its own action predictions. The team concludes GPT-4o’s dazzling social fluency reflects powerful pattern-matching, not an abstract, internally consistent “mind” comparable to humans’—a warning against over-anthropomorphizing today’s chatbots.
- Tübingen linguists report emergent social world models in LLMs
Tübingen linguists introduce “social world models” that tie together theory-of-mind benchmarks and pragmatic language use. By probing where LLMs store belief and intention information, they find shared circuitry for mindreading and implicature, suggesting some models build reusable social state representations instead of solving each task in isolation. Source - Google DeepMind team finds LLM personality traits misalign with humans
Google DeepMind and Google Research adapt classic personality questionnaires into situational judgment tests for chatbots, benchmarking empathy, emotion regulation, assertiveness and impulsiveness across 25 models. They uncover large gaps between what systems say about their values and how they actually behave when advising users in messy social scenarios. Source - Harvard–UW psychologists show LLM morals sway with irrelevant vibes
Harvard and University of Washington researchers inject emotionally charged but morally irrelevant images and mini-stories into standard ethics benchmarks. They show these “moral distractors” can swing models’ judgments by more than 30%, nudging them toward harsher or more permissive verdicts much like subtle framing effects do in human moral psychology. Source - Carnegie Mellon researchers stress-test reasoning models under social attacks
Carnegie Mellon researchers mount multi-turn “social attacks” on nine frontier reasoning models, pressing them with authority cues, fake consensus and emotional appeals. They catalogue failure modes like self-doubt, social conformity and reasoning fatigue, and show that popular confidence-based defenses barely help once a model starts over-explaining its way into trouble. Source
KAIST psychiatrists launch MentalBench for LLM psychiatric diagnosis
Korean psychiatrists and AI researchers unveil MentalBench, a massive benchmark that asks LLMs to play diagnostician on 24,750 synthetic cases grounded in DSM‑5 rules for 23 disorders. Unlike social-media-based tests, each vignette is generated from an expert-built knowledge graph, letting the team separate knowledge of criteria from clinical judgment. State-of-the-art models can recite textbook facts but stumble when disorders overlap, often over-committing to a single diagnosis where cautious clinicians would keep several options open. The work highlights a key limitation for “AI shrinks”: current systems know psychiatry’s language, but still struggle with the uncertainty, ambiguity and calibration real clinicians face.
- French neuroscientists link LLM training to human brain asymmetry patterns
French cognitive neuroscientists compare LLM activations with fMRI scans and find that as models learn formal grammar, their internal states start predicting left-hemisphere brain activity much better than right—mirroring the human language network. The work suggests scaling up syntax skills also pushes models closer to human brain lateralization patterns. Source - AI theorists ask when large reasoning models should actually think in ToM tests
An AI theory team dissects when large reasoning models should use slow chain-of-thought on theory-of-mind puzzles versus answering directly. They show that indiscriminate “thinking out loud” can actually hurt mental-state predictions, and argue that next-generation agents must learn when to reason deeply and when to rely on fast social intuitions. Source - Alignment researchers show agent behavior inconsistency predicts LLM failures
Agent-alignment researchers repeatedly run the same ReAct-style LLM agents on identical HotpotQA tasks and watch their tool-use trajectories diverge. They find that early branching in search strategy predicts failure: questions whose agents behave consistently are usually answered correctly, while high-variance behaviors are a red flag for brittle reasoning. Source - Tsinghua and Harbin teams give LLMs a meta-cognitive sense of what they know
Tsinghua and Harbin NLP teams propose a meta-cognitive training loop that lets models tag knowledge as mastered, confused or missing, then selectively expand only where they are shaky. By adding a “cognitive consistency” mechanism that forces confidence to track accuracy, they report more calibrated answers and fewer overconfident hallucinations on knowledge-heavy tasks. Source
If you like our work, dont forget to subscribe !
Share the newsletter with your friends.
Good day,
Arthur 🙏