Jensen Huang Enters the OpenClaw Arena! Most Powerful Open-Source 'Lobster' Model Rivals Opus 4.6

New智元 Report

Editor: Taozi, Hao Kun

【New智元 Introduction】

OpenClaw welcomes another heavyweight player! NVIDIA has unveiled Nemotron 3 Super late at night, a 120 billion parameter model specifically designed for Agents, with performance rivaling Claude Opus 4.6. Inference speeds have surged by 3 times, and throughput has skyrocketed by 5 times. This 'Lobster' is aiming for the sky.

The world's most valuable company has also entered the OpenClaw battlefield!

Last night, NVIDIA unveiled its new generation of 'open-source model' Nemotron 3 Super, specifically built for large-scale AI agents.

It features 120 billion parameters, 12 billion activated parameters, and a 1 million token context window. Inference speeds have surged by 3 times, and throughput has skyrocketed by 5 times.

Nemotron 3 Super adopts an innovative Mamba-MoE hybrid architecture, completely solving performance bottlenecks in multi-agent collaboration.

Moreover, it is the first model in the 'Nemotron 3 family' to achieve the following three breakthroughs:

• Native pre-training using NVFP4 precision;

• A new LatentMoE hybrid expert architecture that optimizes 'accuracy per compute unit' and 'accuracy per parameter' to the extreme;

• Introduction of MTP (Multi-Token Prediction) layers, enabling native 'speculative decoding' to accelerate inference speeds.

On the Pinchbench benchmark, Nemotron 3 Super stands alone, firmly securing the top spot in open-source models.

In terms of OpenClaw task success rate, it achieved a high score of 85.6%, with performance directly rivaling Claude Opus 4.6 and GPT-5.4.

It can be said that the 'strongest open-source model' perfectly adapted for OpenClaw has been born!

Today, Nemotron 3 Super has open-sourced its pre-training and post-training datasets exceeding 10 trillion tokens, complete training methodologies, and 15 reinforcement learning environments.

Address: https://huggingface.co/collections/nvidia/nvidia-nemotron-v3

NVIDIA's 120 Billion Parameter Behemoth Shakes the Stage

Perfect Match for OpenClaw

Nowadays, as chatbot stages evolve into multi-agent applications, they usually encounter 'two walls'.

The first is context explosion.

The number of tokens generated by multi-agent workflows is up to 15 times higher than in regular conversations.

This is because every interaction requires resending the complete history, including tool outputs and intermediate reasoning processes.

When executing long-cycle tasks, this massive volume of context data not only drives up costs but also容易导致 goal drift, meaning gradually deviating from the agent's originally set goals.

The second is the 'Thinking Tax'.

Complex agents must reason at every step, but calling an LLM for every sub-task makes multi-agent applications extremely expensive and slow to respond, making them difficult to implement in real-world scenarios.

To this end, NVIDIA's open-source Nemotron 3 Super has completely shattered the 'two shackles' of agent applications.

Paper Address: https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf

As mentioned above, Nemotron 3 Super possesses a 1 million token context window.

Especially when running in the OpenClaw environment, AI can retain the entire workflow state in memory, ensuring logical consistency from the first step to the last.

On Artificial Analysis, Nemotron 3 Super has refreshed the SOTA, ranking first on both efficiency and open-source leaderboards.

Among open-source models of similar scale, the new model's accuracy is far ahead.

Meanwhile, the NVIDIA AI-Q research AI agent powered by the new model has taken first place on the DeepResearch Bench and DeepResearch Bench II leaderboards.

Over the next five years, NVIDIA will invest $26 billion to build world-class open-source models.

Hybrid Architecture Revolution, Throughput Soars 5x

This time, NVIDIA has restructured the underlying architecture of Nemotron 3 Super.

The 88-layer network adopts a periodic alternating arrangement, where Mamba-2 layers are responsible for efficient sequence modeling, providing linear time complexity.

A small number of Transformer attention layers are interspersed as 'global anchors', responsible for long-distance information routing across positions and high-precision reasoning.

As a result, compared to the previous generation Nemotron Super model, throughput has increased by up to 5 times, and accuracy by up to 2 times.

Compared to GPT-OSS-120B and Qwen3.5-122B, Nemotron 3 Super achieved the highest scores in all categories.

Moreover, with an input sequence length of 8k and an output sequence length of 64k, its throughput is up to 2.2 times and 7.5 times higher than GPT-OSS-120B and Qwen3.5-122B, respectively.

LatentMoE: Expert Design Understanding Hardware, Squeezing Accuracy from Every Byte

More importantly, Nemotron 3 Super introduces 'Latent MoE' for the first time.

The LatentMoE solution is very clever: before routing and expert computation, tokens are projected from the hidden dimension d to a smaller latent dimension ℓ. Routing and expert computation are both performed in this much smaller dimension.

This means the number of expert parameters to load and the cross-card communication volume are directly reduced by a factor of d/ℓ!

The saved resources can then be used to increase the total number of experts and the number of experts activated each time by the same factor. This is equivalent to 'freely' gaining a boost in accuracy with almost no change in inference cost.

NVIDIA's official blog post explains it more intuitively: spend the compute cost of 1 expert to activate 4 experts.

Compared to traditional MoE, LatentMoE is superior in both parameter utilization and compute utilization.

Multi-Token Prediction: Performance and Inference Efficiency, Two Birds with One Stone

Nemotron 3 Super also adds a powerful weapon: Multi-Token Prediction (MTP), achieving both model quality and inference efficiency.

Traditional training methods are 'predict the next token', but MTP requires the model to predict several future tokens at once at each position.

This actually forces the model to understand causal relationships between multiple steps and longer-term text structures.

It has been proven that this move is very effective, with real improvements in validation set loss and downstream benchmark scores.

Beyond becoming smarter, the greatest utility of MTP is achieving native speculative decoding.

These extra prediction heads act as a 'draft model' built into the model's belly.

During inference, the prediction heads first quickly draft a few candidate subsequent tokens, and then the main model verifies all these drafts in a single forward pass.

This move significantly reduces generation latency, and compared to adding an external independent draft model, the additional compute overhead (FLOPs) is negligible.

Native NVFP4 Precision Pre-training

As Bryan Catanzaro, VP of Research at NVIDIA, stated, Nemotron 3 Super is designed specifically for Blackwell.

During the pre-training phase, the team ran the entire process using NVFP4 precision on the Blackwell platform, significantly reducing memory requirements.

Moreover, with zero accuracy loss, the new model's inference speed is 4 times faster than FP8 on the Hopper architecture.

25 Trillion Tokens + 21 RL Environments, Targeting AI Agents

Like the previous Nemotron 3 Nano, Nemotron 3 Super grew up on 25 trillion tokens of text data.

The entire pre-training is divided into two steps:

Phase 1: Consumes 80% of the data (20 trillion tokens), focusing on data diversity and breadth of knowledge. The corpus covers 16 major categories, ranging from web crawling to code, mathematics, academic papers, and multilingual data.

Phase 2: Consumes the remaining 20% (5 trillion tokens), which consists entirely of carefully selected high-quality data. The weights of Wikipedia, high-quality PDFs, and STEM reasoning data are significantly increased, specifically to boost accuracy.

The resulting 'base model' achieved 86.01 on MMLU, 75.65 on MMLU-Pro, and 84.84 on MATH, leaving top-tier models of similar scale far behind.

Regarding post-training, NVIDIA has heavily invested skill points in 'AI Agent capabilities'.

SFT Stage: Trained on over 7 million samples and 80 billion tokens. In the data mix, agent-related tasks account for as high as 36%, far exceeding dialogue (23%) and reasoning (31%).

The scale of agent training data increased especially ferociously. For conversational tool calling alone, it jumped from 5 domains and 15,588 dialogues in the previous Nano generation to 838 domains and 279,116 dialogues.

RL Stage: Even more grand, proceeding in four steps:

Step 1: Multi-Environment RLVR. Trained simultaneously on 21 environments and 37 datasets, covering mathematics, code, STEM, safety, dialogue, instruction following, long context, puzzles, and various agent tasks. Sample 256 prompts per step, generating 16 responses per prompt.

Step 2: SWE-RL. Specifically trained for software engineering capabilities, investing 20B tokens. Each rollout starts a container, running an agent loop in a real code repository, generating code patches, and verifying them with real test cases.

Step 3: RLHF. 18B tokens. Trained a GenRM reward model based on Qwen3-235B to precisely regulate behavior on identity recognition and safety topics.

Step 4: MTP Recovery. Freeze the model backbone and only train the MTP prediction heads to realign the accuracy of speculative decoding.

How effective is this top-tier AI agent training secret? A few numbers explain everything:

Achieved 60.47% on SWE-Bench (OpenHands), far exceeding GPT-OSS-120B's 41.9%;
Reached 91.75% in the RULER@1M long-context test, while GPT-OSS-120B only scored 22.3%;
Scored 90.21% on AIME25 math reasoning, almost tying with Qwen3.5-122B's 90.36%.

'Lobster' Players Win Big

Thousands of Pages of Reports Load into Memory in Seconds

Nemotron 3 Super's high-precision tool-calling capabilities allow OpenClaw agents to achieve leapfrog evolution in multiple fields.

In software development, AI agents can load the 'entire codebase' into the context at once.

Without cumbersome document splitting, end-to-end code generation, vulnerability fixing, and automated debugging can be achieved.

In financial analysis scenarios, Nemotron 3 Super can directly load reports thousands of pages long into memory.

This eliminates the trouble of repeatedly re-reasoning in lengthy conversations, significantly improving work efficiency.

With its tool-calling capabilities, Nemotron 3 Super can also enable autonomous agents to reliably navigate and operate within vast function libraries, preventing execution errors in high-risk, critical environments such as autonomous security orchestration in cybersecurity.

Now, a large number of 'Lobster' players can use it directly.

Currently, Perplexity has integrated Nemotron 3 Super for user searches, becoming one of the 20 orchestration models in Computer.

Companies providing software development AI agents, such as CodeRabbit, Factory, and Greptile, have integrated it with their own models into AI agents.

Life science and frontier AI institutions like Edison Scientific and Lila Sciences will also use Nemotron 3 Super to provide computing power support for their agents, for deep literature retrieval, data science, and molecular structure understanding.

NVIDIA's OpenClaw is Coming

Having a model is not enough; NVIDIA has brought the platform along this time.

According to WIRED, NVIDIA is secretly building an open-source AI agent platform named NemoClaw, specifically targeting the enterprise market.

Just by the name, you can tell: 'Nemo' corresponds to the Nemotron model family, and 'Claw' points directly to OpenClaw.

In plain English, NVIDIA plans to use its own models to build an enterprise-grade OpenClaw.

What is the biggest difference from OpenClaw? Security.

OpenClaw is popular among individual players, but enterprises dare not touch it. NemoClaw is aimed at this pain point.

Reports indicate that NemoClaw has built-in security and privacy tools from the start, giving enterprises peace of mind.

Moreover, it is completely open-source; it can be used regardless of whether your system runs on NVIDIA chips.

Why open-source? The logic is simple. The more agents are used, the greater the demand for computing power, and NVIDIA still profits.

Nemotron 3 Super is the engine, and NemoClaw is the chassis. Model + Platform, working together.

This time, NVIDIA is handing enterprises a 'ready-to-use' AI agent full package.

OpenClaw let individual players taste the sweetness, but NVIDIA clearly does not intend to let anyone else have the enterprise market cake.

References:

https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/

https://wccftech.com/nvidia-unveils-nemotron-3-super-as-an-open-agentic-ai-model/

https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Super-Technical-Report.pdf

https://pinchbench.com/

Catch Up with ASI in Seconds

⭐ Like, Share, and Follow with One Click ⭐

Light up the star to lock in New智元's rapid push notifications!