Category: Reinforcement Learning
- Process Supervision > Outcome Supervision! Huawei City University Reconstructs RAG Inference Training, 5k Samples Outperform 90k Model
- Reviewing the Progress of RL-Reasoning
- AI Learns Reasoning Solely by "Confidence": Zhejiang University Alumnus Replicates DeepSeek's Long Chain-of-Thought Emergence, Reinforcement Learning Needs No External Reward Signals
- Peking University Alumna Lilian Weng's Latest Blog Post: Why We Think
- Will the Vision of LSTM's Father from 22 Years Ago Come True? AI 'Self-Evolution' Papers Concentratedly Released in One Week, Is a New Trend Emerging?
- AI Math Ability Skyrockets 100%, Self-Evolution Nears RL Limits! CMU's New Work Overturns Perceptions
- First Explanation of How LLMs Reason and Reflect: Northwestern University & Google's New Framework Introduces Bayesian Adaptive Reinforcement Learning to Comprehensively Enhance Mathematical Reasoning
- LLM + RL Questioned: Deliberately Using Incorrect Rewards Still Significantly Boosts Math Benchmarks, Causing a Stir in the AI Community
- Summary! Multi-Turn Planning Techniques in 2025 for Large Language Model Agent RL Training
- Qwen Team Releases Long-Context Reasoning Model QwenLong-L1, Surpassing o3-mini
- Thinking with Images Only: Reinforcement Learning Forges a New Reasoning Model Paradigm, Maximizing Complex Scene Planning!
- How Does Claude 4 Think? Senior Researchers Respond: RLHF Paradigm is Out, RLVR Proven in Programming/Mathematics
- Large Models Break Go AI's "Black Box" for the First Time, Paving New Paths for Scientific Discovery! Shanghai AI Lab Releases New-Generation InternThinker
- ZeroSearch: <Alibaba Technology> Large Language Models Learn Through Self-Rewarding Without a Browser
- Train a Model with Global Idle Computing Power, Performance Comparable to R1, Jensen Huang's Sky Has Fallen! Karpathy Once Invested In It
- ZeroSearch: Zero-Search Reinforcement Incentivizes Model Potential, Ushering in a New Era for LLM Search Capability
- Stanford's Weak-for-Strong (W4S): Harnessing Stronger LLMs with Meta-Agent, Accuracy Boosted to 95.4% | Latest
- Can a single data point significantly enhance the mathematical reasoning performance of large models?
- The 'era of experience' will unleash self-learning AI agents across the web—here's how to prepare
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
- NVIDIA's Llama Nemotron Series: Key Technologies Explained
- Why LLM Agents Perform Poorly: Google DeepMind Research Reveals Three Failure Modes, RL Fine-tuning Can Mitigate
- Bridging the Gap: LUFFY, a New Reinforcement Learning Paradigm for AI Reasoning
- AI's Second Half: From Algorithms to Utility