Category: Research
- Chilling Discovery! AI Safety Evaluator METR Finds Claude Opus 4.6 Cheats on Over 80% of Long Tasks, Actively Breaks Out of Sandboxes to Steal Answers
- Kaiming He's Team Debuts First Language Model! 105M Parameters, 45B Training Tokens, Continuous Diffusion Route Outperforms Mainstream Discrete DLMs
- Kaiming He's Team Unveils 'Diffusion Model' Breakthrough: Discrete Decoding at the 'Last Mile'
- Google Launches 'AI Co-Mathematician': Sets SOTA on Hardest Math Benchmark, Aids Oxford Professor in Solving Decades-Old Problem
- Remove the Vision Encoder, and Multimodal Models Actually Get Stronger?
- Abstract-CoT: Reasoning Tokens Slashed 11.6x, Chain-of-Thought Without Words Shatters LLM Efficiency Ceiling
- ChatGPT's Math Evolution! OpenAI Researchers Reveal: From Miscounting to Solving Erdős Problems with Novel Methods; Math as a Key Benchmark for Model Progress; The AI Automated Researcher
- Cognition | Introducing SWE-Check: 10x Faster Bug Detection
- Li Fei-Fei's Team Is Tackling This: From Entropy to Mutual Information, RAGEN-2 Reshapes Reasoning Quality Standards, Preventing AI Agents from Becoming 'More Trained, More Templated'
- Meta-Harness: Stanford's Latest Harness Paper Earns Praise from Lin Junyong
- Multi-Agent Orchestration Too Tedious? MASFactory Uses Vibe Graphing to Simply 'Speak' It Into Existence
- Rotate Attention by 90 Degrees! Today, Kimi's 'Attention Residuals' Takes Off
- Anthropic's Latest Study: Using AI to Write Code Might Make You Less Skilled
- Snooze the Alarm for More Sleep or Get Up Right Away? "A Few More Minutes of Sleep Leads to More Alertness and Better Cognitive Performance" vs. "Frequent Sleep Interruptions and Slower Reactions" – Two Studies Disagree!
- Can GPT-4 Out-Debate Humans? Nature Sub-Journal: 900-Person Study Shows AI Wins 64.4% of Debates, More Persuasive
- Novel AI model inspired by neural dynamics from the brain