Category: Reinforcement Learning

Stop Obsessing Over Outcome Rewards! CUHK Identifies and Solves the "Information Self-Locking" Problem in RL!
Karpathy Just Open-Sourced AutoResearch: I Used It to Optimize Lobster Skills, Boosting Success Rates from 56% to 92%
KARL: Knowledge Agents based on Reinforcement Learning
OpenClaw-RL: Allowing AI Agents to Self-Evolve Through Chat
Recent Google Publications: Two Notable Papers on Multi-Agent Systems
Are LLM RL Training Trajectories Actually Linear? Miaow Lab's Latest Work: Directly 'Predict' Future Models Without Further Training!
What Exactly Is On-Policy Distillation? An In-Depth Interpretation of On-Policy/Self-Distillation
The Bitter Lesson! ROLL Team Shares: Practical Experience in Agentic RL Training
Xiaomi Introduces JudgeRLVR: Judge First, Generate Second — Breaking the Efficiency Paradox of "Long Chain-of-Thought" in Reasoning Models
Is PPO Dead? The Reinforcement Learning Foundation Used by DeepSeek Has Major Flaws!
Former OpenAI Researcher: AGI Requires Models to Break Through Difficulties on Their Own; The Biggest Problem is Generalization; The Most Important Skill is "Managing Junior Engineers"; Robots Will Have a "ChatGPT Moment" in Two to Three Years
DeepMind World Model Researcher: Is the Transformer Architecture Unimportant? The AGI Bottleneck Lies Elsewhere
What to do with poor pre-training data? Bengio team introduces explicit Bayesian for gradient-free In-Context RL
Let AI Level Itself Up: Meta Pushes Coding to Superintelligence with Self-play RL
LAMER: Meta-Reinforcement Learning Enables Language Agents to Perform Active Exploration
RLVR Reinforcement Learning Training Costs Plummet 98%! 12 PEFT Methods Head-to-Head, Results Are Surprising...
Breaking News! DeepSeek Officially Releases 2 Models
US Air Force Integrates AI into Advanced Wargaming
What? RLVR Isn't Learning New Knowledge—It's Learning How to Use Knowledge for Reasoning!
Xiaohongshu Proposes DeepEyesV2: From "Visual Thinking" to "Tool Collaboration", Exploring New Dimensions in Multimodal Intelligence
Microsoft Proposes GAD Framework: Open-Source Models Can Directly Distill Black-Box GPT-5
Reinforcement Learning + Large Model Memory: Mem-α, Enabling Agents to "Learn How to Remember" for the First Time
SJTU PhD's Latest Insights: Clarifying Reinforcement Learning with Just Two Questions
Meta's Two Latest Agent Learning Papers Are Quite Interesting!
The More You Fail, The Faster You Learn! Trajectory Rewriting Allows AI Agents to Create Perfect Experiences from Mistakes!