Category: AI Agents

The Current State and Dilemmas of AI Agents: MIT, Cambridge, Stanford and Others Jointly Publish Analysis Report
Codex Lead Reveals OpenAI Internal Development: Reinvented Weekly! Codex Has Evolved into a Teammate, Can Run Overnight and Self-Test! Advice for Newcomers: Fundamentals Never Go Out of Style; Windows Version Coming Soon
OpenAI Codex Product Lead: Programming as We Know It Is Over
Qwen3.5: Towards Native Multimodal Agents
The Bitter Lesson! ROLL Team Shares: Practical Experience in Agentic RL Training
WebMCP: A Bomb Google Planted in Chrome 146
Just Released: Claude 4.6 and GPT-5.3-Codex Simultaneously Launched!
Is Cursor No Longer Cool? A Top 0.01% Expert Defects to Claude, and Their 10,000-Word Defection Notes Go Viral!
Let AI Level Itself Up: Meta Pushes Coding to Superintelligence with Self-play RL
LAMER: Meta-Reinforcement Learning Enables Language Agents to Perform Active Exploration
Can Large Models Handle Precision Work Too?! MIT Top Conference Paper Teaches AI to Operate Industrial CAD Software
The Significance of Gemini 3: AI Has Surpassed the 'Hallucination Phase', Approaching Humans, 'Human-Machine Collaboration' Will Shift from 'Humans Correcting AI' to 'Humans Guiding AI Work'
Claude Launches Skills Feature and Agent Skills Development Guide
Letting CoT "Evolve" with the Environment: AgileThinker Achieves "Thinking While Doing" | Latest from Tsinghua
Reinforcement Learning + Large Model Memory: Mem-α, Enabling Agents to "Learn How to Remember" for the First Time
The More You Fail, The Faster You Learn! Trajectory Rewriting Allows AI Agents to Create Perfect Experiences from Mistakes!
The Two Major Pain Points of Agent Long-Range Search Have Been Solved! CAS DeepMiner Runs Nearly 100 Rounds with 32k Context, Open Source Performance Closes in on Closed Source.
Abandoning Fine-Tuning: Stanford Co-releases Agentic Context Engineering (ACE), Boosting Model Performance by 10% and Reducing Token Costs by 83%
Google Enters the CUA Battleground, Launches Gemini 2.5 Computer Use: Allowing AI to Directly Operate the Browser
Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
OpenAI Board Chair: "Per-Token Billing" Is Completely Wrong, Market Will Eventually Choose "Outcome-Based Pricing"
ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
RAG Can Also Reason! Thoroughly Solving the Multi-Source Heterogeneous Knowledge Challenge
OpenAI Podcast Revisited: The AI Coding War! Developers Are the Most Fortunate: Specialized Code Models Will Emerge! Host Leaks: "I Like Claude the Most!"
RL Scaling Breakthrough! DeepSWE Open-Source AI Agent Tops Leaderboard, Training Methods and Weights Fully Released