Category: Reinforcement Learning
- Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
- Microsoft Introduces rStar2-Agent: "Thinking Smarter" Proves Far More Effective and Efficient Than Simply "Thinking Longer"
- LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
- Evolution and Development Trends of Reinforcement Learning Frameworks
- Advancing Silicon-Based Intelligence: Shuchao Bi's Insights on Past, Present, and Future AI
- ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
- RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent
- Revisiting Qwen3's Abandoned Mixed Inference Mode
- Why Can't Language Models Directly Output Answers with Confidence?
- DeepSeek-GRPO Importance Weight Design Flaw? Explaining Qwen3's New Reinforcement Learning Algorithm GSPO
- Counter-Intuitive RL Research: Directly Providing Answers to LLMs is More Effective Than Detailed Step-by-Step Instructions!
- Alibaba Open-Sources Breakthrough Agent Overnight, Directly Challenges OpenAI with State-of-the-Art Performance!
- RL Scaling Breakthrough! DeepSWE Open-Source AI Agent Tops Leaderboard, Training Methods and Weights Fully Released
- Tsinghua Research: A Reversal? Confirming RL Doesn't Truly Enhance Base Model Reasoning Ability!
- Tsinghua and Others Propose Absolute Zero Self-Play Large Models, Achieving Top Performance on Multiple Tasks with Zero-Data Training
- AGI Theory Comparison: Active Inference, Reinforcement Learning, Control Theory, Bayesian Brain, Utility Decision, Bounded Rationality, Emotional Motivation, Dynamic Homeostasis
- LLMs Can Now Self-Update Weights, Significantly Enhancing Self-Adaptation and Knowledge Integration Capabilities – Has AI Awakened?
- NVIDIA (ProRL) | Can RL truly enhance the reasoning capabilities of LLMs?
- LLMs Can Now Self-Update Weights, Significantly Boosting Adaptive and Knowledge Integration Capabilities. Is AI Waking Up?
- SRO Architecture Empowers Qwen-2.5-VL's Reasoning Capability, Boosting Performance by 16.8%
- New Breakthrough in Large Model Reinforcement Learning – SPO New Paradigm Boosts Large Model Reasoning Capability!
- SFT+RL Two-Stage Training Breaks Through LLM Self-Supervision! RUC DeepCritic Achieves Autonomous Evolution of AI Critique
- R1-like Training No Longer Just Focuses on Result Correctness! CUHK Launches SophiaVL-R1 Model
- The First Multimodal Dedicated Slow-Thinking Framework! Outperforms GPT-o1 by Nearly 7 Percentage Points, Reinforcement Learning Teaches VLM to "Think Twice"
- 10 Lines of Code, 15% Improvement in AIME24/25! Unveiling the Entropy Mechanism in Large Language Model Reinforcement Learning