Category: Attention Mechanisms

ModelBest's SALA Architecture Is Tearing Down the Transformer's Wall
In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures
Must-Read: In-depth Comparison of Mainstream LLM Architectures, Covering Llama, Qwen, DeepSeek, and Six Other Models
Kimi K2's Key Training Technique: QK-Clip!