Category: Deep Learning
- How Do You Evaluate the Interaction Model Recently Released by Thinking Machines? - wangleineo's Answer
- ICML 2026 | Rejecting Brute Force, PRISM Framework Enables Efficient Test-Time Scaling for dLLMs
- Attention Is All You Need Author Returns: Can a 99% Sparse Transformer Be Even Faster?
- Token-Level, Precision Length Control: 3B Model Beats GPT 5.4 and Claude
- Hardcore: Google's Jeff Dean Says the Bottleneck for Million-Chip LLM Pre-training Has Been Completely Broken!
- Stanford's New Theory Unravels the Mystery of Neural Network Generalization, Adding One Line of Code to Adam Yields 2.4x Speedup
- Remove the Vision Encoder, and Multimodal Models Actually Get Stronger?
- What Did DeepSeek's Overnight Deleted New Paper Actually Say?
- Scaling Laws for Looped Transformers
- OCR Domain Adaptation Without Retraining from Scratch? Decoupling Language Models Reduces Computation by 95%
- Demystifying the Sparse LLM Innovation by NVIDIA and Sakana AI
- How Can a Model Trained on 200M Real Tokens Match the Performance of 360M Data?
- AI Doesn't Need to Understand the World, But We Need to Understand AI
- Rotate Attention by 90 Degrees! Today, Kimi's 'Attention Residuals' Takes Off
- Nvidia's new technique cuts LLM reasoning costs by 8x without losing accuracy
- Mining Activation Functions Like Crypto? DeepMind Builds a 'Compute Farm' to Brute-Force Search for the Next-Gen ReLU
- Stop Clipping Aggressively! Qwen Proposes GatedNorm, Unifying the Perspective on Residual Flow Mysteries
- Google's New Discovery: DeepSeek Reasoning Splits into Multiple Personalities, Left and Right Brain Competing for Intelligence
- Is Transformer Dead? DeepMind Is Betting on Another AGI Path
- What to do with poor pre-training data? Bengio team introduces explicit Bayesian for gradient-free In-Context RL
- Optimization is Geometry, Geometry is Inference: Using Mathematics to End the Transformer Black Box Era
- RLVR Reinforcement Learning Training Costs Plummet 98%! 12 PEFT Methods Head-to-Head, Results Are Surprising...
- Attention Is Not What You Need? Reframing Sequence Modeling with Geometric Aesthetics via Grassmann Manifolds
- Wenfeng Liang Signs, DeepSeek Kicks Off New Year with a New Macro Architecture Chapter, Cracking the Gradient Explosion and Memory Wall
- [In-Depth] Ilya Sutskever's Selected Paper: The Platonic Representation Hypothesis