Following the pioneering release of the first trillion-parameter science model 'Intern' Intern-S1-Pro, the Shanghai Artificial Intelligence Laboratory (Shanghai AI Lab) open-sourced a new-generation model preview, Intern-S2-Preview, on May 15. This further expands the capability boundaries of "deeply specialized general-purpose models" while significantly lowering the barrier to entry. Its key breakthroughs include:

Smaller Size: With a 35B parameter scale, it achieves capabilities comparable to trillion-parameter models across multiple core domains.

Stronger Scientific Capabilities, Breakthrough in Structure Generation: The research team enhanced the performance of small-parameter models in complex scientific tasks by increasing task difficulty and diversity. For instance, by introducing a real-number prediction module, it achieved material crystal structure generation capability for the first time in an open-source general-purpose large model.

Leading Scientific Agent Capabilities, Better Serving Real Research Scenarios: Not only does it achieve a leading level among models of its size in comprehensive science scenario programming tasks, but it also surpasses mainstream closed-source models like Claude-Haiku-4.5 and GPT-5.4-Nano in scientific discovery tasks.

Simultaneously, Intern-S2-Preview deepens its synergy with the Ascend computing ecosystem, achieving full-process optimization in key areas such as training, inference, and evaluation, further validating the value of the domestic software-hardware collaborative system in the direction of scientific large models.

Experience Link:

https://chat.intern-ai.org.cn/

GitHub Link:

https://github.com/InternLM/Intern-S1

HuggingFace Link:

https://huggingface.co/collections/internlm/intern-s2

ModelScope Link:

https://modelscope.cn/collections/Shanghai_AI_Laboratory/Intern-S2

Chart comparing Intern-S2-Preview scores against mainstream models on scientific and general task benchmarks

Scoring of Intern-S2-Preview against mainstream models on scientific and general task evaluation benchmarks

Exploring Task Scaling and Reinforcement Learning to Accelerate "General-Specialized Fusion"

Condensing a trillion-parameter scale scientific multimodal large model into an efficient, easy-to-use base model is an extremely challenging task. The core idea behind its implementation path comes from the Shanghai AI Lab's continuous exploration of the "general-specialized fusion" technical route. The research team discovered that model capability evolution does not solely rely on traditional parameter scaling and data augmentation; by increasing task difficulty and enriching task diversity, the model's capability ceiling can also be continuously raised, exhibiting a scaling effect.

Compared to Intern-S1-Pro, Intern-S2-Preview further expands specialized scientific tasks into a "full-chain training" paradigm: each specialized scientific task is equipped with high-quality data and training strategies covering from pre-training to post-training, relying on a stable and efficient training infrastructure to achieve multi-task fusion training. In this process, when a large number of high-difficulty, diverse tasks undergo unified fusion training, a small model can achieve the performance level of a trillion-parameter model on multiple scientific tasks. The key lies in the full-chain "general-specialized fusion" mechanism: if only a single training stage is optimized, a trade-off often occurs between capabilities; but after full-chain fusion, different tasks instead form a mutually reinforcing synergistic effect, further unlocking the model's overall potential in complex scientific tasks.

On this basis, the team focused on exploring reinforcement learning from multiple aspects, empowering Intern-S2-Preview to accelerate the realization of "general-specialized fusion":

Guiding the model to use chain-of-thought to complete specialized scientific tasks such as biological multi-omics understanding, leveraging the generalization advantage of chain-of-thought to achieve performance comparable to trillion-parameter models with a 35B small-parameter model.
Extending the training steps of reinforcement learning, combined with higher-difficulty (e.g., graduate-level) disciplinary reasoning problems and specialized scientific tasks, allows the small model to be fully trained on various problems, ultimately achieving mastery and possessing cross-domain reasoning capabilities.
Guided by the concept of Intelligence Quality per Token (IQPT), exploring innovative algorithms such as chain-of-thought folding, leveraging the intelligence quality per token to drive model performance improvement. Notably, in mathematical reasoning tasks, Intern-S2-Preview achieved extreme compression of chain-of-thought length, yet its effectiveness rivals that of a recent model with nearly 300B parameters, achieving a dual breakthrough in performance and efficiency.

Infographic illustrating the concept of Intelligence Quality Per Token and its impact on model efficiency

Continuous Upgrade in Scientific Capabilities, Rivaling Mainstream Closed-Source Models

With empowering scientific discovery as its core objective, Intern-S2-Preview focuses on exploring more complex scientific scenarios. Taking small-molecule structure space modeling capability as an example, it serves as the core support for the model to accurately understand microscopic structures like molecules and crystals. This capability not only determines the upper limit of structural understanding and generation accuracy but is also the foundation for adapting to complex research scenarios. Building upon previous innovations such as introducing Fourier Position Encoding (FoPE) and reconstructing temporal encoders, the research team further strengthened this capability and introduced a real-number prediction module, achieving material crystal structure generation capability for the first time in an open-source general-purpose large model.

Visualization comparing crystal structures generated by Intern-S2-Preview with other models

To precisely verify this capability, the team selected the MolecularIQ evaluation benchmark for specialized testing. This benchmark focuses on assessing the model's spatial modeling and topological understanding of molecular internal structures, which is significantly more challenging compared to traditional tasks that can be completed solely through molecular formulas. Evaluation results show: Intern-S2-Preview scored 57.26 on MolecularIQ, surpassing Gemini-3.1-Pro's 41.33.

If structural understanding primarily serves the analysis and screening stages in research, then structure generation is a "creative task" that drives scientific innovation. The field of material crystal structure generation previously relied on specialized models for a long time. Intern-S2-Preview not only fills the gap in this field for open-source general-purpose large models but is also the first structure generation model that can provide a thought process. This task requires the model to generate dozens of high-precision spatial coordinates to describe the material crystal structure. The structure pass rate for closed-source models like GPT-5.5 is about 10%, whereas Intern-S2-Preview's pass rate exceeds 40%, significantly improving the quality and usability of generated structures and providing efficient support for scientific innovation.

Through these innovations, Intern-S2-Preview possesses the potential for high-precision coordinate regression without relying on diffusion models. This not only reduces the implementation cost of related tasks but also provides a brand-new technical solution for various coordinate regression-type research tasks.

Upgraded Scientific Agent Capabilities, Efficiently Supporting Complex Research Tasks

Thanks to the introduction of a systematic task synthesis method during the training phase, Intern-S2-Preview's general agent capabilities have been further enhanced. The team built high-quality agent training data that closely reflects real-world application scenarios, leveraging the open-source community's skill repository and real tool ecosystems. The focus was on strengthening the model's ability for step decomposition, skill invocation, and autonomous execution of complex tasks, effectively broadening the capability boundary from multi-turn dialogue to complex task planning and autonomous execution.

In real sandbox environment long-horizon task-solving scenarios, Intern-S2-Preview demonstrated robust task understanding, tool invocation, multi-step decision-making, and state-tracking capabilities in general agent evaluation benchmarks like PinchBench, enabling it to continuously complete task execution in dynamic environments and self-correct based on environmental feedback. Moreover, with continuously enhanced scientific reasoning capabilities, Intern-S2-Preview performed excellently on the SciCode benchmark, which focuses on scientific programming and algorithm solving, ranking among the top models of its size. It possesses strong scientific code generation capabilities, efficiently supporting complex research tasks such as scientific computing, algorithm development, and research script writing.

Co-evolution of "Algorithm-System-Computing Power" to Enhance Training and Inference Efficiency

The research team performed systematic optimization around model training, inference deployment, and automated evaluation, enhancing training and inference efficiency through the co-evolution of "algorithm-system-computing power."

On the Ascend A3 super node, the training framework introduced multiple video memory and memory optimization techniques to improve the stability of multimodal long-sequence training. Additionally, the computational process was optimized for variable-length input scenarios by planning data chunking in advance and reducing data interaction between the host and device, thereby further improving overall computational efficiency.

In terms of integrated training and inference, based on the training framework XTuner and the deployment inference framework LMDeploy, the team introduced a shared weight calculation method on top of supporting multi-token prediction reinforcement learning. This reduces inconsistencies between training and inference while improving the validity of generated results, making training more stable and inference more efficient.

To address the issue of the vision module consuming a disproportionately high amount of time during multimodal long-sequence training, the team achieved a more balanced resource allocation by offline-simulating the computational power proportion of the vision and language modules under different sequence lengths, thereby further improving overall training efficiency.

Since the first release of the Intern large model in 2023, the Shanghai AI Lab has gradually built a rich family of Intern models. It also pioneered and open-sourced a full-chain open-source tool system for large model R&D and application, including the training framework XTuner, the deployment inference framework LMDeploy, the innovative open evaluation system OpenCompass, and the intelligent document parsing engine MinerU, forming an active open-source community involving hundreds of thousands of developers.

Since its release, Intern-S1 has topped the HuggingFace global multimodal leaderboard multiple times, with cumulative downloads exceeding 1 million. Its superior cross-modal scientific understanding capability not only provides efficient tools for research but also lowers the barrier for global research teams to enter AGI for Science through open source. In the future, the Shanghai AI Lab will continue to promote model capability improvement and research paradigm innovation, working with global partners to build a more open and efficient scientific AI ecosystem.

Group photo or promotional image for the Intern large model family and community

Image linking to related internal content 1

Image linking to related internal content 2

Image linking to related internal content 3

Image linking to related internal content 4

35B-Parameter Science Model Rivals Trillion-Parameter Giants: 'Intern' Science Model Intern-S2-Preview Open-Sourced

Scoring of Intern-S2-Preview against mainstream models on scientific and general task evaluation benchmarks

Exploring Task Scaling and Reinforcement Learning to Accelerate "General-Specialized Fusion"

Continuous Upgrade in Scientific Capabilities, Rivaling Mainstream Closed-Source Models

Upgraded Scientific Agent Capabilities, Efficiently Supporting Complex Research Tasks

Co-evolution of "Algorithm-System-Computing Power" to Enhance Training and Inference Efficiency

Related Articles

分享網址