Category: Research

Ask, Don’t Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement
Google and Cornell's New Research: The Next Step for LLMs Is Learning How to 'Sleep' Well
Chilling Discovery! AI Safety Evaluator METR Finds Claude Opus 4.6 Cheats on Over 80% of Long Tasks, Actively Breaks Out of Sandboxes to Steal Answers
Kaiming He's Team Debuts First Language Model! 105M Parameters, 45B Training Tokens, Continuous Diffusion Route Outperforms Mainstream Discrete DLMs
Kaiming He's Team Unveils 'Diffusion Model' Breakthrough: Discrete Decoding at the 'Last Mile'
Google Launches 'AI Co-Mathematician': Sets SOTA on Hardest Math Benchmark, Aids Oxford Professor in Solving Decades-Old Problem
Remove the Vision Encoder, and Multimodal Models Actually Get Stronger?
Abstract-CoT: Reasoning Tokens Slashed 11.6x, Chain-of-Thought Without Words Shatters LLM Efficiency Ceiling
ChatGPT's Math Evolution! OpenAI Researchers Reveal: From Miscounting to Solving Erdős Problems with Novel Methods; Math as a Key Benchmark for Model Progress; The AI Automated Researcher
Cognition | Introducing SWE-Check: 10x Faster Bug Detection
Li Fei-Fei's Team Is Tackling This: From Entropy to Mutual Information, RAGEN-2 Reshapes Reasoning Quality Standards, Preventing AI Agents from Becoming 'More Trained, More Templated'
Meta-Harness: Stanford's Latest Harness Paper Earns Praise from Lin Junyong
Multi-Agent Orchestration Too Tedious? MASFactory Uses Vibe Graphing to Simply 'Speak' It Into Existence
Rotate Attention by 90 Degrees! Today, Kimi's 'Attention Residuals' Takes Off
Anthropic's Latest Study: Using AI to Write Code Might Make You Less Skilled
Snooze the Alarm for More Sleep or Get Up Right Away? "A Few More Minutes of Sleep Leads to More Alertness and Better Cognitive Performance" vs. "Frequent Sleep Interruptions and Slower Reactions" – Two Studies Disagree!
Can GPT-4 Out-Debate Humans? Nature Sub-Journal: 900-Person Study Shows AI Wins 64.4% of Debates, More Persuasive
Novel AI model inspired by neural dynamics from the brain