Category: Reinforcement Learning
- ZeroSearch: Zero-Search Reinforcement Incentivizes Model Potential, Ushering in a New Era for LLM Search Capability
- Stanford's Weak-for-Strong (W4S): Harnessing Stronger LLMs with Meta-Agent, Accuracy Boosted to 95.4% | Latest
- Can a single data point significantly enhance the mathematical reasoning performance of large models?
- The 'era of experience' will unleash self-learning AI agents across the web—here's how to prepare
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
- NVIDIA's Llama Nemotron Series: Key Technologies Explained
- Why LLM Agents Perform Poorly: Google DeepMind Research Reveals Three Failure Modes, RL Fine-tuning Can Mitigate
- Bridging the Gap: LUFFY, a New Reinforcement Learning Paradigm for AI Reasoning
- AI's Second Half: From Algorithms to Utility