Gemini 3.5 Pro Leaked: Coding Performance Rivals GPT-5.5! Google Finally Gets Serious

New Intelligence Report

Editor: Hao Kun

【New Intelligence Guide】 Gemini 3.5, codenamed 'Cappuccino', has been leaked ahead of schedule, skipping directly from the 3.2 naming convention! A brand new 24-hour Agent called 'Spark' can manage your emails, run tasks, and even spend your money and place orders without asking you.

Just moments ago, Gemini 3.5 was prematurely exposed!

User Lentils released the latest news: the Gemini 3.5 Pro checkpoint, codenamed 'Cappuccino', has already started producing outputs.

Just a few hours ago, the rumor was still about Gemini 3.2, and unexpectedly, it has now been replaced with Gemini 3.5.

With this jump in naming from 3.2 to 3.5, Google clearly wants to tell a bigger story at I/O.

Major Gemini Update: Google Plays its Trump Card

Just a day earlier, the well-known leaker can was the first to share a batch of outputs.

One was an interactive blueprint breakdown of a DualShock 4 controller, and the other was a vector illustration of a pelican riding a bicycle, complete with a 7-dimensional customization panel allowing real-time switching of frame color, lighting, headwear, basket contents, and pedaling speed.

Judging by the screenshots, this is no longer a simple SVG; it's a fully interactive web application generated from a single prompt!

Abacus.AI CEO Bindu Reddy subsequently released even more explosive data—

3.2 Flash achieved 92% of GPT-5.5's performance in coding and reasoning, while being 15 to 20 times cheaper.

That's not all; Google's brand new, always-on Agent "Gemini Spark" was also unearthed.

It's visible that it can not only be on standby around the clock, managing your emails and running tasks, but it might even place orders for you without asking.

However, just at that moment, an exclusive leak from Alex Heath poured a bucket of cold water on the excitement—

The new Gemini's performance can, at best, only match OpenAI's GPT-5.5...

One Prompt, Four Options: Gemini's 'Laziness' Is Cured

Let's start with the good news.

Previously, when Gemini generated SVGs, the community's most common complaint was a single word: "lazy." One prompt would yield a half-hearted result.

But this time is different.

User Lentils used a single simple prompt, and Gemini directly output four Robot SVGs, each with a distinct style and rich in detail.

The concurrently leaked 3.5 Flash also corroborates this trend.

Anonymous scores from the LM Arena show that Flash has surpassed 3.1 Pro in SVG generation, interactive 3D coding, and animation processing.

In other words, Google's distillation and sparsification techniques are paying off, compressing frontier models into lightweight versions without a cliff-like drop in quality.

Managing Your Emails, Spending Your Money: Google's Agent is Bold

Another heavyweight leak from the same day is "Gemini Spark BETA."

According to the leak, Spark is positioned as "your everyday AI agent, on standby around the clock."

It's a 24/7 fully operational AI Agent that handles your inbox, executes online tasks, and manages multi-step workflows.

Spark's list of data sources is breathtaking.

Connected Google apps, skill modules, chat history, scheduled tasks, websites you've logged into, Personal Intelligence, and location information.

Gemini will share your name, contact details, files, preferences, and other information with third parties to complete tasks.

Furthermore, to maintain session continuity, the system will save remote browser data, including login credentials and remote code execution data.

It's worth noting, however, that while Spark is designed to ask for permission before sensitive operations, it "may share your information or complete purchases without asking."

In other words, it could place an order for you without asking, or share your information out without prompting you.

Spark's predecessor is an upgraded version of Google's internally codenamed Agent, "Remy", which was previously only available to AI Ultra subscribers.

From Remy to Spark, Gemini's Agent has evolved from "a feature" to a "24/7 digital life steward."

This directly competes with Anthropic's soon-to-be-released managed Agent Conway, and OpenAI's already online 24/7 Agent platform.

Topped the Charts Half a Year Ago, Now Doesn't Touch the Frontier's Edge

And that's the end of the good news.

According to confirmations Alex Heath received from multiple sources, the new Gemini set to launch next Tuesday falls roughly at the level of GPT-5.5, which is a noticeable gap from Mythos.

Think back: when first launched, Gemini 3 swept almost every major leaderboard's top spot with an LMArena score of 1501 Elo.

Half a year later, after the successive releases of GPT-5.5, Opus 4.7, and Mythos, the landscape has been completely rewritten.

Evaluations by the UK AI Safety Institute show that Mythos is the first model to pass both sets of its cybersecurity testing criteria, while GPT-5.5 only passed one set.

AISI even admitted that its evaluation framework is struggling to keep up with Mythos's capabilities.

Back on Google's side, according to a model selector interface dug up by user Fandu, the new Gemini likely natively supports third-party MCP tool integration, and the Thinking mode will undergo a comprehensive revamp.

As can be seen, besides the familiar models like 3.1 Flash-Lite, 3 Flash, and 3.1 Pro, there's a never-before-seen category: "MCP Tool Testing," meaning "models available for MCP tool testing."

The thinking mode has also been transformed from a standalone feature into a global toggle, divided into two levels: Standard (suitable for most problems) and Extended (for solving complex problems).

Coding: The Battlefield That Makes DeepMind Most Anxious

In Heath's leak, the wording about the coding section is the most emphatic.

He says DeepMind is facing real internal pressure, especially the need to catch up in coding ability.

The target to catch up with is clear: Anthropic. Over the past year, Claude has become the default choice among the developer community.

The new Gemini will include coding improvements, but among Heath's sources, no one believes it will bring a qualitative change.

Google's AI coding platform, Antigravity, is heavily used internally but has failed to break through in the external market.

A 6% developer adoption rate over 4 months isn't slow for an IDE, but the momentum gap is significant when compared to Claude Code and Codex.

What's the problem?

A monthly review from XDA tested three tools on the same task.

Claude Code accurately understood a complex creative prompt on the first try. Antigravity's output, however, looked like a doodle made with Microsoft Paint.

Furthermore, Antigravity's pricing strategy has been a headache for developers.

Google has adjusted its pricing model multiple times, from free previews to a credit point system, and the community forums are filled with complaints about running out of credits without warning.

But the most critical point is that AI coding has fully broken out of its niche.

Whether it's Claude Cowork or OpenAI's Codex, they allow people who can't write code to use them with soaring efficiency—

A product manager describes requirements in natural language and directly gets a runnable prototype. A designer drops in a Figma file and receives front-end code.

Yet so far, Google has had no product entering into this conversation.

However, a comment from prominent figure Haider offers another perspective.

Google may not intend to win by running on the same track as others; their greater focus is on building a more powerful multimodal system, which takes time.

The ASI Flywheel: All Three Hitting the Gas Simultaneously

Even if the model can't keep up, Google has a billion-scale distribution entry point and an always-on Agent.

Once Spark is rolled out, users' emails, schedules, shopping, and browsing data will feed back into the next generation of Gemini's training.

This is a strategy that is very difficult for OpenAI and Anthropic to replicate.

But the competitors are not idle.

Just yesterday, OpenAI added an ultrafast mode to Codex, increasing speed by 2-3 times, and launched a subsidy war: companies switching within 30 days get 2 months free. 2,000 developers responded within 3 hours.

Anthropic simultaneously released the Opus 4.7 Fast mode, increasing Claude Code quota by 50%.

This subsidy war appears to be a fight for developers on the surface, but the underlying logic is much deeper.

The development of GPT-5.6 can almost certainly be assumed to be occurring with deep involvement from GPT-5.5. Code written by AI feeds back into AI training; whoever controls the users of programming tools controls the accelerator for this cycle.

All three are flooring the accelerator on their respective tracks.

OpenAI relies on crushing iterative speed, with a new version every three weeks. Anthropic relies on being deified for model quality, with Mythos redefining the frontier. Google relies on envelopment through distribution and Agents, stuffing AI into the phones of a billion people.

No one is slowing down. The flywheel towards ASI has already begun to spin on its own.

And for those who use these tools every day, this three-way arms race might be the best bargain of 2026.

Subsidies are increasing, quotas are being raised, models are getting stronger, and prices are dropping.

The only question is: have you bet on the right track for your workflow?

References:
https://x.com/alexeheath/status/2054747125616169229
https://www.testingcatalog.com/google-prepares-gemini-spark-ai-agent-ahead-of-i-o-launch/
https://x.com/Lentils80/status/2054628116094501377

Chasing ASI at Light Speed

⭐ Like, Forward, and Favorite – One Click for All Three ⭐

Illuminate the star icon to lock in instant push notifications from Synced!

Gemini 3.5 Pro Leaked: Coding Performance Rivals GPT-5.5! Google Finally Gets Serious

Related Articles

分享網址