Category: Algorithm Optimization
- Make Thinking More Accurate and Extended! The New Reinforcement Learning Algorithm FIPO Arrives
- Peking University Team Optimizes DeepSeek Attention: 4x Speed Increase Without Accuracy Loss
- Is PPO Dead? The Reinforcement Learning Foundation Used by DeepSeek Has Major Flaws!
- Microsoft Proposes GRPO-RoC: Trajectory Quality Filtering is Key to Agentic RL
- DeepSeek-GRPO Importance Weight Design Flaw? Explaining Qwen3's New Reinforcement Learning Algorithm GSPO