Category: Reinforcement Learning
- Is Synthetic Data Better Than Real Data?
- SortedRL: Accelerates Large Model RL Training by 50%, Boosting Efficiency by 18%
- Lin Junyang Speaks Out for the First Time After Leaving Alibaba: Reviewing Qwen's Detours, Pointing to AI's New Path
- Let AI 'Refine' Its Own Data! DataChef Goes Open Source: Using Reinforcement Learning to Automatically Generate LLM Data Recipes
- NVIDIA Nemotron-Cascade 2 Technical Report
- ICLR 2026 | How Far Can Unsupervised Reinforcement Learning Go for Large Models? A Systematic Answer from the Tsinghua Team
- Stop Obsessing Over Outcome Rewards! CUHK Identifies and Solves the "Information Self-Locking" Problem in RL!
- Karpathy Just Open-Sourced AutoResearch: I Used It to Optimize Lobster Skills, Boosting Success Rates from 56% to 92%
- KARL: Knowledge Agents based on Reinforcement Learning
- OpenClaw-RL: Allowing AI Agents to Self-Evolve Through Chat
- Recent Google Publications: Two Notable Papers on Multi-Agent Systems
- Are LLM RL Training Trajectories Actually Linear? Miaow Lab's Latest Work: Directly 'Predict' Future Models Without Further Training!
- What Exactly Is On-Policy Distillation? An In-Depth Interpretation of On-Policy/Self-Distillation
- The Bitter Lesson! ROLL Team Shares: Practical Experience in Agentic RL Training
- Xiaomi Introduces JudgeRLVR: Judge First, Generate Second — Breaking the Efficiency Paradox of "Long Chain-of-Thought" in Reasoning Models
- Is PPO Dead? The Reinforcement Learning Foundation Used by DeepSeek Has Major Flaws!
- Former OpenAI Researcher: AGI Requires Models to Break Through Difficulties on Their Own; The Biggest Problem is Generalization; The Most Important Skill is "Managing Junior Engineers"; Robots Will Have a "ChatGPT Moment" in Two to Three Years
- DeepMind World Model Researcher: Is the Transformer Architecture Unimportant? The AGI Bottleneck Lies Elsewhere
- What to do with poor pre-training data? Bengio team introduces explicit Bayesian for gradient-free In-Context RL
- Let AI Level Itself Up: Meta Pushes Coding to Superintelligence with Self-play RL
- LAMER: Meta-Reinforcement Learning Enables Language Agents to Perform Active Exploration
- RLVR Reinforcement Learning Training Costs Plummet 98%! 12 PEFT Methods Head-to-Head, Results Are Surprising...
- Breaking News! DeepSeek Officially Releases 2 Models
- US Air Force Integrates AI into Advanced Wargaming
- What? RLVR Isn't Learning New Knowledge—It's Learning How to Use Knowledge for Reasoning!