Category: Reinforcement Learning

Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
First Multi-Round LLM Router Unveiled: Router-R1 Teaches Large Models to "Think–Route–Aggregate"
Princeton Danqi Chen's Group's New Work: RLHF Insufficient, RLVR Bounded? RLMT Forges a Third Path
ByteDance Breaks the 'Entropy Curse' in LLM RL Training, Enabling Models to Learn with Certainty!
Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
Microsoft Introduces rStar2-Agent: "Thinking Smarter" Proves Far More Effective and Efficient Than Simply "Thinking Longer"
LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
Evolution and Development Trends of Reinforcement Learning Frameworks
Advancing Silicon-Based Intelligence: Shuchao Bi's Insights on Past, Present, and Future AI
ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent
Revisiting Qwen3's Abandoned Mixed Inference Mode
Why Can't Language Models Directly Output Answers with Confidence?
DeepSeek-GRPO Importance Weight Design Flaw? Explaining Qwen3's New Reinforcement Learning Algorithm GSPO
Counter-Intuitive RL Research: Directly Providing Answers to LLMs is More Effective Than Detailed Step-by-Step Instructions!
Alibaba Open-Sources Breakthrough Agent Overnight, Directly Challenges OpenAI with State-of-the-Art Performance!
RL Scaling Breakthrough! DeepSWE Open-Source AI Agent Tops Leaderboard, Training Methods and Weights Fully Released
Tsinghua Research: A Reversal? Confirming RL Doesn't Truly Enhance Base Model Reasoning Ability!
Tsinghua and Others Propose Absolute Zero Self-Play Large Models, Achieving Top Performance on Multiple Tasks with Zero-Data Training
AGI Theory Comparison: Active Inference, Reinforcement Learning, Control Theory, Bayesian Brain, Utility Decision, Bounded Rationality, Emotional Motivation, Dynamic Homeostasis
LLMs Can Now Self-Update Weights, Significantly Enhancing Self-Adaptation and Knowledge Integration Capabilities – Has AI Awakened?
NVIDIA (ProRL) | Can RL truly enhance the reasoning capabilities of LLMs?
LLMs Can Now Self-Update Weights, Significantly Boosting Adaptive and Knowledge Integration Capabilities. Is AI Waking Up?
SRO Architecture Empowers Qwen-2.5-VL's Reasoning Capability, Boosting Performance by 16.8%
New Breakthrough in Large Model Reinforcement Learning – SPO New Paradigm Boosts Large Model Reasoning Capability!