Category: Reinforcement Learning
- Recent Google Publications: Two Notable Papers on Multi-Agent Systems
- Are LLM RL Training Trajectories Actually Linear? Miaow Lab's Latest Work: Directly 'Predict' Future Models Without Further Training!
- What Exactly Is On-Policy Distillation? An In-Depth Interpretation of On-Policy/Self-Distillation
- The Bitter Lesson! ROLL Team Shares: Practical Experience in Agentic RL Training
- Xiaomi Introduces JudgeRLVR: Judge First, Generate Second — Breaking the Efficiency Paradox of "Long Chain-of-Thought" in Reasoning Models
- Is PPO Dead? The Reinforcement Learning Foundation Used by DeepSeek Has Major Flaws!
- Former OpenAI Researcher: AGI Requires Models to Break Through Difficulties on Their Own; The Biggest Problem is Generalization; The Most Important Skill is "Managing Junior Engineers"; Robots Will Have a "ChatGPT Moment" in Two to Three Years
- DeepMind World Model Researcher: Is the Transformer Architecture Unimportant? The AGI Bottleneck Lies Elsewhere
- What to do with poor pre-training data? Bengio team introduces explicit Bayesian for gradient-free In-Context RL
- Let AI Level Itself Up: Meta Pushes Coding to Superintelligence with Self-play RL
- LAMER: Meta-Reinforcement Learning Enables Language Agents to Perform Active Exploration
- RLVR Reinforcement Learning Training Costs Plummet 98%! 12 PEFT Methods Head-to-Head, Results Are Surprising...
- Breaking News! DeepSeek Officially Releases 2 Models
- US Air Force Integrates AI into Advanced Wargaming
- What? RLVR Isn't Learning New Knowledge—It's Learning How to Use Knowledge for Reasoning!
- Xiaohongshu Proposes DeepEyesV2: From "Visual Thinking" to "Tool Collaboration", Exploring New Dimensions in Multimodal Intelligence
- Microsoft Proposes GAD Framework: Open-Source Models Can Directly Distill Black-Box GPT-5
- Reinforcement Learning + Large Model Memory: Mem-α, Enabling Agents to "Learn How to Remember" for the First Time
- SJTU PhD's Latest Insights: Clarifying Reinforcement Learning with Just Two Questions
- Meta's Two Latest Agent Learning Papers Are Quite Interesting!
- The More You Fail, The Faster You Learn! Trajectory Rewriting Allows AI Agents to Create Perfect Experiences from Mistakes!
- Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
- First Multi-Round LLM Router Unveiled: Router-R1 Teaches Large Models to "Think–Route–Aggregate"
- Princeton Danqi Chen's Group's New Work: RLHF Insufficient, RLVR Bounded? RLMT Forges a Third Path
- ByteDance Breaks the 'Entropy Curse' in LLM RL Training, Enabling Models to Learn with Certainty!