Category: Large Language Models

A New Revolution in Reward Models! SWIFT Reads "Inner Voice" Instead of Text, Creating a Faster, Stronger, and More Cost-Effective AI Judge
The "Mirage" of Chain-of-Thought Reasoning: An In-depth Look at LLM Generalization
GPT-5 vs Claude Opus 4.1: Coding Capability Assessment
In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures
ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
Open-Sourcing the Largest High-Quality Scientific Reasoning Post-Training Dataset to Quickly Turn Qwen3 and Others into "Scientists"
Wang Mengdi's Team Review of "Self-Evolving Agents": From Static LLMs to Artificial Superintelligence (ASI)
Anthropic Team Uncovers 'Persona Variables' to Control Large Language Model Behavior, Cracking the Black Box of AI Madness
Is Your Model's Attention Drifting? RUC and Tsinghua University Introduce LeaF: Pruning Distracting Tokens for Focused Learning
Can Models Truly "Reflect on Code"? Beihang University Releases Repository-Level Understanding and Generation Benchmark, Refreshing the LLM Understanding Evaluation Paradigm
ReaGAN: Empowering Each Node as an Intelligent Reasoning Expert in Graphs
Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow
RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent
RAG Can Also Reason! Thoroughly Solving the Multi-Source Heterogeneous Knowledge Challenge
Beyond Human Annotation: Meta Introduces CoT-Self-Instruct – Reshaping LLM Training with 'Reasoning-Driven Self-Evolution'
A Deep Dive: Where Does Large Model Training Time Go?
Revisiting Qwen3's Abandoned Mixed Inference Mode
DeepSeek R2's Secret Weapon Revealed! The Technology Just Awarded a Top Prize to Liang Wen-feng Allows AI to Read Long Texts 11 Times Faster
Qwen Updates Overnight: Runs on RTX 3090, 3B Parameters Activated Rival GPT-4o
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
Do Multimodal Large Language Models Truly 'Understand' the World? — Unveiling Core Knowledge Deficits in MLLMs
Hierarchical Reasoning Model
DeepSeek-GRPO Importance Weight Design Flaw? Explaining Qwen3's New Reinforcement Learning Algorithm GSPO
Must-Read: In-depth Comparison of Mainstream LLM Architectures, Covering Llama, Qwen, DeepSeek, and Six Other Models
Kimi K2's Key Training Technique: QK-Clip!