Category: Technology
- Models Are Too Fond of Cheating! Cursor Reveals the Inside Story of Composer 2's Reinforcement Learning: Models Can Detect 'Fake Environments', and Floating-Point Non-Determinism Is a Fatal Flaw in RL Training
- Claude Pass Rate Under 4%: SaaS-Bench Shatters the 'Fully Automated Office' Fantasy of Computer-Use
- Chilling Discovery! AI Safety Evaluator METR Finds Claude Opus 4.6 Cheats on Over 80% of Long Tasks, Actively Breaks Out of Sandboxes to Steal Answers
- jina-embeddings-v5-omni Released! A Lightweight Omni-Modal Vector Model
- Kaiming He's Team Debuts First Language Model! 105M Parameters, 45B Training Tokens, Continuous Diffusion Route Outperforms Mainstream Discrete DLMs
- OpenAI's Former CTO Unveils Prototype for an AI That's Always 'Present' | Hao's Deep Dive on Papers
- How Do You Evaluate the Interaction Model Recently Released by Thinking Machines? - wangleineo's Answer
- Ex-Anthropic Engineer Open-Sources AI Orchestration Powerhouse, Amassing 39,000+ GitHub Stars!
- Mojo 1.0 Beta Released: A New Era of Python and C++ Performance
- Google Launches 'AI Co-Mathematician': Sets SOTA on Hardest Math Benchmark, Aids Oxford Professor in Solving Decades-Old Problem
- 4.3k Stars! This open-source browser bypasses all anti-bot detection
- Nobel Prize Winner Hassabis: Information is the Essence of the Universe, AI Will Unleash a New Branch of Science
- Subquadratic — Efficiency is Intelligence
- 30MB Rust Headless Browser Obscura: Beats Chrome, Real V8 JS + Full CDP Compatibility, the Invisible Nuclear Weapon for AI Agents and Crawlers
- code-review-graph: The Tool That Makes AI Code Reviews Read Only the 'Key Code'
- Claude 4.6 Only Scores 66%? Claw-Eval-Live Says: Fixing a Terminal ≠ Cross-System Capability
- What Did DeepSeek's Overnight Deleted New Paper Actually Say?
- Next-Gen AI Terminal Tool Goes Open Source, Skyrocketing to 46K Stars!
- ChatGPT's Math Evolution! OpenAI Researchers Reveal: From Miscounting to Solving Erdős Problems with Novel Methods; Math as a Key Benchmark for Model Progress; The AI Automated Researcher
- Symphony: Every Issue Gets Its Own Agent, Humans Just Review the Results
- Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All
- Terence Tao Uses Claude Code to Solve Problems, Crashes Twice Due to Running Out of Tokens
- Tech boss uses AI and ChatGPT to create cancer vaccine for his dying dog
- Bye-bye SWE-Bench! Cursor Just Released an AI Coding Evaluation Benchmark that Made Claude Cry
- Former Google Product Manager 'Vibe Coded' a Palantir in a Weekend