Category: Model Architecture
- Mamba-3
- Rotate Attention by 90 Degrees! Today, Kimi's 'Attention Residuals' Takes Off
- Breaking Static Model Weights! Tencent Hunyuan Releases Real-Time Brain-Switching Technology for Inference
- Stop Clipping Aggressively! Qwen Proposes GatedNorm, Unifying the Perspective on Residual Flow Mysteries
- Less is More: Recursive Reasoning with Tiny Networks
- Zhipu's New Model Also Uses DeepSeek's MLA, Runs on Apple M5
- Wenfeng Liang Signs, DeepSeek Kicks Off New Year with a New Macro Architecture Chapter, Cracking the Gradient Explosion and Memory Wall