Category: Tool Use
- Agent-World: Scaling Real-World Environments for Co-Evolution of Agents and Environments!
- Microsoft Proposes GRPO-RoC: Trajectory Quality Filtering is Key to Agentic RL
- ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
- Summary! Multi-Turn Planning Techniques in 2025 for Large Language Model Agent RL Training