Category: Deep Learning
- Demystifying the Sparse LLM Innovation by NVIDIA and Sakana AI
- How Can a Model Trained on 200M Real Tokens Match the Performance of 360M Data?
- AI Doesn't Need to Understand the World, But We Need to Understand AI
- Rotate Attention by 90 Degrees! Today, Kimi's 'Attention Residuals' Takes Off
- Nvidia's new technique cuts LLM reasoning costs by 8x without losing accuracy
- Mining Activation Functions Like Crypto? DeepMind Builds a 'Compute Farm' to Brute-Force Search for the Next-Gen ReLU
- Stop Clipping Aggressively! Qwen Proposes GatedNorm, Unifying the Perspective on Residual Flow Mysteries
- Google's New Discovery: DeepSeek Reasoning Splits into Multiple Personalities, Left and Right Brain Competing for Intelligence
- Is Transformer Dead? DeepMind Is Betting on Another AGI Path
- What to do with poor pre-training data? Bengio team introduces explicit Bayesian for gradient-free In-Context RL
- Optimization is Geometry, Geometry is Inference: Using Mathematics to End the Transformer Black Box Era
- RLVR Reinforcement Learning Training Costs Plummet 98%! 12 PEFT Methods Head-to-Head, Results Are Surprising...
- Attention Is Not What You Need? Reframing Sequence Modeling with Geometric Aesthetics via Grassmann Manifolds
- Wenfeng Liang Signs, DeepSeek Kicks Off New Year with a New Macro Architecture Chapter, Cracking the Gradient Explosion and Memory Wall
- [In-Depth] Ilya Sutskever's Selected Paper: The Platonic Representation Hypothesis
- SJTU PhD's Latest Insights: Clarifying Reinforcement Learning with Just Two Questions
- A New Perspective on NAS: Graph Neural Networks Drive Universal Architecture Space, Hybrid Convolutional and Transformer Performance Leaps!
- Is Cancer Truly Close to Being Conquered by AI? Google Announces Two Breakthroughs in Two Days
- NTU and Others Propose A-MemGuard: Locking AI Memory, Dropping Poisoning Attack Success Rate by Over 95%
- Mamba Architecture Heads to ICLR 2026: Can AI's Core Brain, Transformer, Maintain Its Throne?
- Recursive Reasoning HRM Model Reimagined! TRM Two-Layer Network (7M Parameters) Outperforms LLMs!
- In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures
- Xiaohongshu Open-Sources First Multimodal Large Model, dots.vlm1, Performance Rivals SOTA!
- Google Open-Sources DeepPolisher, Halving Genome Assembly Error Rates; Jeff Dean: "Exciting!"
- Qwen Updates Overnight: Runs on RTX 3090, 3B Parameters Activated Rival GPT-4o