Today, we released and open-sourced the first hybrid linear architecture trillion-parameter thinking model Ring-2.5-1T.
As a key step towards the era of general agents, we have massively scaled the hybrid linear attention architecture in both pre-training and reinforcement learning. On one hand, we leverage the efficient 1:7 MLA + Lightning Linear Attention architecture to enhance the model's thinking efficiency and exploration space. On the other hand, by expanding the scale of reinforcement learning and agent environments, we improve the model's thinking depth and long-horizon execution capabilities.
Compared to the previously released Ring-1T, Ring-2.5-1T has significantly improved in generation efficiency, thinking depth, and long-horizon execution:
Efficient Generation: Thanks to the high proportion of linear attention mechanism, at generation lengths exceeding 32K, memory access scale is reduced by over 10x, and generation throughput is increased by over 3x, making it particularly suitable for tasks requiring deep thinking and long-horizon execution.
Deep Thinking: By introducing dense reward on the basis of RLVR to feedback the rigor of the thinking process, Ring-2.5-1T simultaneously achieves gold medal level in IMO 2025 and CMO 2025 (self-tested).
Long-horizon Execution: Through large-scale fully-async agentic RL training, it significantly enhances long-horizon autonomous execution capabilities for complex tasks, allowing Ring-2.5-1T to easily adapt to agent programming frameworks like Claude Code and personal AI assistants like OpenClaw.
Deep Thinking and Long-horizon Execution
To evaluate the deep thinking and long-horizon execution capabilities of Ring-2.5-1T, we selected representative open-source thinking models (DeepSeek-v3.2-Thinking, Kimi-K2.5-Thinking) and closed-source APIs (GPT-5.2-thinking-high, Gemini-3.0-Pro-preview-thinking-high, Claude-Opus-4.5-Extended-Thinking) as references. Ring-2.5-1T has achieved open-source leading levels in both mathematics, coding, logic, and other high-difficulty reasoning tasks (IMOAnswerBench, AIME 26, HMMT 25, LiveCodeBench, ARC-AGI-V2) and agentic search, software engineering, tool invocation, and other long-horizon task execution (Gaia2-search, Tau2-bench, SWE-Bench Verified).
We also tested the heavy thinking mode, which achieves test-time scaling by expanding parallel thinking and summarization during inference, effectively enhancing the depth and breadth of reasoning.
In IMO 2025 (full score 42), Ring-2.5-1T scored 35, reaching gold medal level; in CMO 2025 (full score 126), it scored 105, significantly above the gold medal line (78) and the national team selection line (87). By comparing the answer results of Ring-2.5-1T and Ring-1T, we find that the former has significantly improved in the rigor of reasoning logic, the use of advanced mathematical proof techniques, and the completeness of answer presentation. We have now made the detailed solutions of Ring-2.5-1T in IMO 2025 and CMO 2025 publicly available. The full content can be viewed at the following link: https://github.com/inclusionAI/Ring-V2.5/tree/main/examples
Moreover, in the challenging agentic search GAIA2-search task, Ring-2.5-1T achieves open-source SOTA level. The GAIA2 environment emphasizes cross-application tool collaboration and complex task execution capabilities. Ring-2.5-1T excels in both efficiency and accuracy in plan generation and multi-step tool invocation.
Trillion-scale Hybrid Linear Attention Architecture
In the era of general agents, deep thinking and long-horizon execution are becoming basic working paradigms for language foundations. This shift demands extremely high architectural capabilities for foundation models in long-horizon reasoning decoding efficiency. As a key step towards agentic model architecture, the Ling 2.5 architecture introduces a hybrid linear attention architecture based on the Ling 2.0 architecture. Through incremental training, the GQA of the Ling 2.0 architecture is upgraded to a 1:7 MLA + Lightning Linear structure. Specifically, based on the previously released Ring-flash-linear-2.0 technical route, we transform some GQA layers into Lightning Linear Attention to significantly improve throughput in long-horizon reasoning scenarios. To further compress KV Cache, we approximately convert the remaining GQA layers into MLA, and adapt characteristics such as QK Norm and Partial RoPE to enhance the expressive power of the Ling 2.5 architecture under the hybrid attention architecture.
Ling 2.5 Architecture at 1T Scale
After the transformation, the activated parameter count of Ring-2.5-1T increases from 51B to 63B. But with the support of the hybrid linear attention architecture, its inference efficiency still achieves significant improvement compared to Ling 2.0. Even compared to the KIMI K2 architecture with only 32B activated parameters, the Ling 2.5 architecture at 1T scale still has a significant advantage in throughput in long-horizon inference scenarios; and the longer the generation length, the more obvious the throughput advantage.
Single machine 8-card H20-3e, batch size = 64, decode throughput comparison under different generation lengths
Single machine 8-card H200, batch size = 64, decode throughput comparison under different generation lengths
Hand-crafted Case
We integrated Ring-2.5-1T into Claude Code. To test its long-horizon software development capabilities, we prompted it to automatically develop a miniature operating system (TinyOS) as follows:
1. System Boot Process:
- Use GRUB as the bootloader, following the Multiboot standard
- Write boot.asm assembly file to set basic CPU mode (32-bit protected mode)
- Jump from assembly to kernel_main function in main.c
2. Core Function Implementation:
- Screen Output: Implement simple character display functions (e.g., clear screen, print strings)
- Interrupt Handling: Set up basic GDT and IDT, handle keyboard input interrupts
- Memory Management: Implement basic memory paging initialization
- Keyboard Support: Able to receive keyboard input and echo to screen
3. Code Structure:
- Provide complete linker.ld linking script
- Provide Makefile for compiling and generating ISO image
- Each key function must have clear comments
4. Code Requirements:
- Ensure code is concise, modular, avoid unnecessary complexity
- Prioritize implementing a working minimal feature set
- Reserve interfaces for future extension
First output the complete list of code files with brief descriptions, then provide the full code for each file.
All generated code must be directly compilable and runnable, with specific compilation and testing methods provided.
You must ensure that this operating system can be actually run using qemu.
Ring-2.5-1T ran in Claude Code for 2 hours and 8 minutes, and finally completed the above task. The detailed record is in the following video:
We then continued to let Ring-2.5-1T enrich TinyOS's functionality with the following prompt:
Good, now you continue development, implement bash functionality so that using qemu can log into a bash command interface to execute simple commands like ls, pwd, cat, etc.
The final developed TinyOS is shown in the following video:
We also integrated Ring-2.5-1T into the personal AI assistant OpenClaw to help read AI infra literature and demonstrate technical logic with Java code.
Limitations and Future Plans
This version of the model still has shortcomings in token efficiency and instruction following. There is also significant room for optimization in long-horizon execution and actual delivery capabilities for more realistic and complex tasks. We will continue to improve these capabilities in future versions and greatly look forward to feedback and suggestions from the community. Currently, the training of Ring-2.5-1T is still ongoing. The complete technical report will be officially released after the next version.
Additionally, it should be noted that the above GAIA2 benchmark evaluation uses the widely adopted OpenAI function call format by the community, not the original ReAct format. The relevant evaluation configuration and方案 will be submitted to the GAIA2 GitHub repository for broader and reproducible comparison and evaluation by the community.
Welcome everyone to visit our open-source repository and experience page for download and use
🤗 Hugging Face: https://huggingface.co/inclusionAI/Ring-2.5-1T
🤖 ModelScope: https://modelscope.cn/models/inclusionAI/Ring-2.5-1T
Ling Studio (https://ling.tbox.cn/chat) and ZenMux (https://zenmux.ai/)'s Ring-2.5-1T Chat experience page and API service will be launched soon.
Click 【Read Original】 to access the Hugging Face page for Ring-2.5-1T.