Google Gemini 3.1 Pro Dominates Benchmarks, Tsinghua's Yao Shunyu Strikes! Claude and GPT Forced into a Corner

New Intelligence Yuan Report

Editors: Haokun, Taozi

[New Intelligence Yuan Briefing] Google DeepMind dropped a bombshell late at night, officially unveiling the next-generation flagship model Gemini 3.1 Pro. In the notoriously difficult ARC-AGI-2 test, it achieved the highest score stunning Silicon Valley, with reasoning capabilities doubling and dethroning Claude 4.6.

Gemini 3.1 Pro Announcement

Following Gemini 3 Pro, Google DeepMind has finally unleashed its ultimate move!

Just now, the next-generation flagship model Gemini 3.1 Pro made a late-night debut, shattering SOTA records across all domains to become the new AI king.

Gemini 3.1 Pro Performance ChartBenchmark Comparison

Following Deep Think, Tsinghua alumnus Yao Shunyu also participated in the development of Gemini 3.1 Pro.

This time, Gemini 3.1 Pro represents an epic leap in large-scale model reasoning capabilities.

In the extremely rigorous ARC-AGI-2 test, it achieved a remarkable score of 77.1%, with performance soaring to more than double that of the previous generation 3.0 Pro.

Coupled with a near-perfect score (98%) on ARC-AGI-1, whether it's the reasoning-heavy Claude Opus 4.6 or the specially tuned GPT-5.2, all have been left in the dust.

ARC-AGI Benchmark ResultsModel Comparison Chart 1Model Comparison Chart 2

From the SVG comparison test below, one can intuitively feel the massive generation gap in capabilities between 3.1 Pro and 3 Pro.

SVG Animation Comparison

In coding and reasoning domains, Gemini 3.1 Pro similarly dominates, comprehensively crushing Sonnet 4.6 and GPT-5.2.

In the AAII comprehensive evaluation, 3.1 Pro topped the charts, not only leading Claude Opus 4.6 by a full 4 points in total score but also costing less than half in API call expenses.

AAII Benchmark Chart

Starting today, Gemini 3.1 Pro is officially available in Gemini and NotebookLM. Developers can get early access through Google AI Studio, Antigravity, and Android Studio.

Availability Announcement

Now, the AI battlefield in Silicon Valley has fundamentally shifted, with only heavyweight players Google DeepMind and Anthropic left to face off.

OpenAI, previously enjoying the limelight, seems to be gradually losing its initiative on this main battlefield.

Section Divider

Gemini 3.1 Pro Late Night Raid

Comprehensive SOTA Scores Doubled

As Google's most formidable model to date, 3.1 Pro achieves a comprehensive leap beyond 3 Pro.

It not only possesses native full-modal input capabilities but also supports super-long contexts of up to 1 million tokens.

Context Window Comparison

In the performance benchmarks most closely watched by the industry, Gemini 3.1 Pro demonstrates breathtaking dominance.

In the Humanity's Last Exam (HLE), Gemini 3.1 Pro achieved 44.4% without tool assistance, cornering GPT-5.2 (34.5%) and Opus 4.6 (40.0%).

In the ARC-AGI-2 test, Gemini 3.1 Pro achieved a heaven-defying score of 77.1%, leaving Opus 4.6 (68.8%), which had just reached the top two days ago, trailing behind.

Even more shocking is its quantum leap evolution in code and AI agent domains.

In LiveCodeBench Pro, it chopped down an Elo score of 2887, leaving peers in the dust;

In Terminal-Bench 2.0, with a score of 68.5%, it suppressed the code-specialized GPT-5.3-Codex (64.7%);

In APEX-Agents, it achieved a commanding 33.5%, compared to Opus 4.6's 29.8% and GPT-5.2's mere 23.0%.

Coding Benchmark Chart 1Coding Benchmark Chart 2

Beyond hardcore reasoning, Gemini 3.1 Pro also shows its muscle in processing lengthy texts.

In the MRCR v2 128k long-context test, it directly achieved a high score of 84.9%.

More terrifyingly, it exclusively supports the ultimate test of 1M tokens with a score of 26.3%, while competing GPT-5.2 and Opus 4.6 simply show "not supported" at this level.

Long Context Benchmark

More importantly, compared to the previous generation, 3.1 Pro has significantly reduced hallucination rates.

Hallucination Rate ComparisonSection Divider

Hand-Crafted God-Level Applications, This is the Killer AI

What 3.1 Pro brings is not just benchmark crushing but a comprehensive evolution in logical reasoning capabilities.

Now, it can not only crack extremely tricky logic puzzles but also demonstrates stunning productivity reshaping capabilities in practical applications.

Whether transforming obscure concepts into intuitive diagrams, condensing massive data into clear charts, or turning wild creativity into reality, 3.1 Pro handles them all with ease.

Application Examples

Code-Based Animation

With just a simple text prompt, 3.1 Pro can directly generate SVG animations that can be seamlessly embedded into web pages.

The most amazing part is that these pure-code constructed animations not only support infinite scaling with absolute clarity but also have incredibly small file sizes compared to traditional videos.

Integrating Complex Systems

Powerful reasoning capabilities also allow 3.1 Pro to completely break down barriers between complex APIs and human-friendly design.

For example, it can directly build a real-time aerospace data dashboard, perfectly connecting to open telemetry data streams to clearly display the real-time operational trajectory of the International Space Station before your eyes.

Interactive Design

3.1 Pro can even write extremely complex 3D starling murmuration effects in pure code, creating an entire immersive experience for you.

In this system, you can "conduct" the flock in real-time through gesture tracking technology while hearing generative music that evolves in real-time with the flock's dynamics.

This is absolutely a powerful tool for researchers and designers developing multimodal interactive interface prototypes.

Creative Programming

More interestingly, 3.1 Pro can transform classic literary themes into truly executable exquisite code.

For example, when asked to design a modern-style personal homepage for "Wuthering Heights," the model not only precisely captured the oppressive and profound atmosphere of the original work but also generated a minimalist and modern interface, perfectly grasping the soul of the protagonist.

Section Divider

Stunning First Tests Across the Web, Dominating SVG

Google UX Engineer Michael Chang directly put it to the test, using 3.1 Pro to simulate complex urban planning and instantly generating and designing a brand-new city bird's-eye topology.

City Planning Visualization

With just a one-sentence prompt, 3.1 Pro produced an 11-second SVG animation within just 3 minutes.

SVG Animation GeneratedSVG Animation Demo

In another SVG test, its generated "seal balancing a ball" is also visually stunning.

Seal Balancing Ball Animation

AI expert Simon Willison tested it by having 3.1 Pro generate a clear pelican SVG with legs outlined within 5 minutes.

Pelican SVG Generation

In 3D spatial reasoning, 3.1 Pro is also the new SOTA.

3D Spatial Reasoning Demo

The 3D pixel-version Pokemon generated by 3.1 Pro is far superior to 3.0 Pro.

3D Pixel Pokemon Comparison3D Pokemon Animation

Additionally, 3.1 Pro can generate optimal interactive animations showing the entire process of a seed sprouting and growing into a big tree.

Tree Growth Animation FrameTree Growth AnimationSection Divider

Evolution Has No End, Only a Stronger Next Chapter

Starting today, the Gemini 3.1 Pro preview is officially released—this is just the beginning.

Google stated that from November last year to today, authentic user feedback has accelerated every iteration.

Gemini Evolution Timeline

Gemini 3.1 Pro's late-night raid is another reshaping of the AI industry landscape.

With this nearly "muscle-flexing" iteration speed, Google DeepMind tells the world—

In the deep waters leading to AGI, only players with tightly coupled hardware compute and algorithmic depth can secure tickets to the second half.

References:

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

https://x.com/Google/status/2024519455389192204?s=20

https://deepmind.google/models/model-cards/gemini-3-1-pro/

Related Articles

分享網址
AINews·AI 新聞聚合平台
© 2026 AINews. All rights reserved.