Category: Attention Mechanisms
- ModelBest's SALA Architecture Is Tearing Down the Transformer's Wall
- In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures
- Must-Read: In-depth Comparison of Mainstream LLM Architectures, Covering Llama, Qwen, DeepSeek, and Six Other Models
- Kimi K2's Key Training Technique: QK-Clip!