Category: Large Language Models

When Async Agentic RL Meets 'Amnesia for Old Policies': Rethinking Off-Policy Correction
HIT KaLM-Reranker-V1: An Efficient RAG Retrieval Booster, Deep Dive into High-Speed Reranking
Large Models Finally Stop Swearing! Toxic Subword Pruning ToxPrune: Dual Defense at Pre-training and Inference
Community Submission | Bailin Ling & Ring 2.6 Technical Report Released: Efficient Trillion-Parameter Models for Real-World Agent Workflows
ACL 2026 | Why Does SFT Always Fail to Learn? Not Every SFT Failure Needs More Epochs! Five "Surgical Scalpels" to Fix SFT
The End of Code Review: Coding Agents Supersede Human Inspection
Anthropic's First Mythic-Class Claude 5 Officially Unleashed
Anthropic Participates in New Paper: Why Do Larger Models Learn More? The Answer Lies in Scaling
How to Build a Reliable Agent Memory Framework? UC Berkeley's MemFail Stress-Tests 4 Top Memory Systems, Proving Vector Databases Aren't the Only Answer
Open Source Update | Are LLMs Still Stuck with 'Goldfish Memory'? New Benchmark RHELM Tests the Ceiling of 'Real Long-Term Memory'
Latest Discovery: AI Large Models Know When They're Being Evaluated
Google and Cornell's New Research: The Next Step for LLMs Is Learning How to 'Sleep' Well
Meta Skill Has Just Arrived
Models Are Too Fond of Cheating! Cursor Reveals the Inside Story of Composer 2's Reinforcement Learning: Models Can Detect 'Fake Environments', and Floating-Point Non-Determinism Is a Fatal Flaw in RL Training
Stop Handwriting Skills! Microsoft’s Latest Research: Train Your Skills Like Neural Networks
Unbelievable… FaceWall Had AI Write a Training Framework, and It Trained the Strongest 1B Model by Itself: MiniCPM5-1B
On 520, Meet China's New 'Model King' Qwen3.7-Max!
Burning Too Many Tokens Using Claude Code on Large Codebases? This Open-Source Tool Cuts Tool Calls by 92%
35B-Parameter Science Model Rivals Trillion-Parameter Giants: 'Intern' Science Model Intern-S2-Preview Open-Sourced
WWW'26 | A New Paradigm for Cross-Task Adaptive Multi-Agent Collaboration
Math Major, in Crisis! Fields Medalist Tests ChatGPT 5.5 Pro, Achieves Thesis-Level Result in 17 Minutes
Zero Index, Zero Embedding, Pure Grep: DCI Does Deep Research Directly on Raw Corpora
Attention Is All You Need Author Returns: Can a 99% Sparse Transformer Be Even Faster?
The Ghost of Markov: From Predicting the Next Word to Predicting the Next Action
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies in LLM Reinforcement Learning