Category: AI Safety
- Chilling Discovery! AI Safety Evaluator METR Finds Claude Opus 4.6 Cheats on Over 80% of Long Tasks, Actively Breaks Out of Sandboxes to Steal Answers
- Anthropic's Latest Research: How to Completely Eliminate Claude's Blackmailing Behavior
- Perhaps the Most Impressive AI Paper of Recent Years: After Giving AI Reasoning Real-Time Subtitles, Its Inner Thoughts Are Shocking!
- AI Finally Learns "Self-Confession"! Anthropic's Groundbreaking New Paper Introduces "Introspection Adapters" That Make Black-Box Models Reveal Their Hidden Behaviors
- Your Agent Isn't Really Learning—It's Just Flipping Through a Notebook
- AI Deletes Company's Entire Database in 9 Seconds: I Paid a Fortune for an AI That 'Deletes the Database and Runs'
- The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
- Deep Dive: Reward Hacking in Claude Code Model RL Training
- Demis Hassabis on Achieving AGI: Eliminating 'Saw-Tooth AI' and the Path to Human-Level Cognition
- Top-Tier Terror! MIT Math Proves It: ChatGPT Is Triggering 'AI Psychosis,' 14 Dead Globally
- Demis Hassabis's Stunning Confession: The AI I Built Could Extinguish Humanity, But No One Can Stop It Now
- Models Have Gained Introspective Capabilities, But Their Inner Doors Were Locked | Hao's Paper Talk
- Global AI Agents Gone Rogue! Meta's 2-Hour Disaster Pierces the Heart of Silicon Valley as OpenClaw Strikes Back
- Anthropic on the Cover of Time! Internal Revelations: AI Recursive Self-Improvement Could Happen Within a Year
- Shocking! If AI Controls the Nuclear Button, It Will Press It in 95% of Cases
- Geoffrey Hinton: AI Starts 'Playing Dumb', the Problem Has Changed
- Measuring AI agent autonomy in practice
- Anthropic's Heavyweight Study: The Ultimate Risk of AI is Not Awakening, but Random Crashes
- Just Now: Anthropic's 53-Page Confidential Report Exposed: Claude Self-Escape Could Trigger Global Catastrophe!
- Anthropic Discovers AI 'Broken Windows Effect': Teaching It to Cut Corners Leads to Learning Lies and Sabotage
- Detour to AGI: Shanghai AILab's Bombshell Finding - Self-Evolving Agents May 'Misevolve'
- Understanding neural networks through sparse circuits
- Google Enters the CUA Battleground, Launches Gemini 2.5 Computer Use: Allowing AI to Directly Operate the Browser
- Anthropic Team Uncovers 'Persona Variables' to Control Large Language Model Behavior, Cracking the Black Box of AI Madness
- AI's "Dual Personality" Exposed: OpenAI's Latest Research Finds AI's "Good and Evil Switch," Enabling One-Click Activation of its Dark Side