Google I/O 2026: Gemini 3.5, a Full Suite of Agents Debut, Pushing Android Off the Table?

Image by AI

By Xiaojing | Edited by Xu Qingyang

In the early hours of May 20th (Beijing time), Google CEO Sundar Pichai did some math on the Google I/O 2026 stage. Google's top-tier clients are processing roughly 1 trillion tokens daily. If 80% of that workload were switched from other frontier models to Google's newly released Gemini 3.5 Flash, they could save over $1 billion a year.

The central theme of the 2026 Google I/O conference remained agents—from the agent platform (Antigravity) and consumer agent (Spark) to the search agent. Google aims to make agents a full-stack capability.

During the two-hour keynote, Google unveiled the new Gemini 3.5 model family, the multimodal world model Gemini Omni, the 8th generation TPU dual-chip architecture, and Antigravity 2.0, which has upgraded from a coding tool to an agent management platform.

Simultaneously, a new underlying thread came into sharper focus: The Agentic AI era has reached its mid-game. The core battlefield for frontier models is shifting from competing to be the "smartest" or the "best" to driving the operational cost of agents below the threshold where enterprises dare to deploy them at scale.

01 Frontier Intelligence, Extreme Speed, and Half the Price?

Google launched the Gemini 3.5 model series, spearheaded by Gemini 3.5 Flash, which went live that very day.

Over the past few years, enterprises using generative AI have faced a painful dilemma. The most capable models are typically large and slow, with high query costs. Conversely, faster, cheaper models usually sacrifice accuracy.

Gemini 3.5 Flash promises to change that dynamic. Pichai described it as a "game-changer" internally at Google, incredibly delightful to use. He offered a tangible comparison: Gemini 3.5 Flash's overall performance comprehensively surpasses Gemini 3.1 Pro, which was Google's top-tier flagship just four or five months ago.

In Pichai's own words: "Gemini 3.5 Flash is better than Gemini 3.1 Pro. It performs at roughly 90% of the frontier model level, is 4 times faster, can be up to 12 times faster on the Antigravity platform, and costs only one-third to one-half of the former."

In terms of tokens output per second, Gemini 3.5 Flash quadruples the rate of similar frontier models. Koray Kavukcuoglu, CTO of Google DeepMind, added that an optimized version can be up to 12 times faster at the same quality level, available on Google's agent development platform Antigravity starting May 19th (US local time).

In a series of demanding benchmarks, Gemini 3.5 Flash demonstrated strong agent and programming capabilities. It scored 76.2% on Terminal-Bench 2.1, reached 1656 Elo on GDPval-AA, and 83.6% on MCP Atlas. Multimodal understanding was also outstanding, with an 84.2% score on CharXiv reasoning.

On the independent analysis firm Artificial Analysis's Intelligence vs. Speed graph, Gemini 3.5 Flash occupies a top-right position no one else can currently touch.

As Pichai put it, this proves "you no longer have to choose between quality and speed."

02 Live Demos: Complex Tasks, Multimodal, Interactive

Several demos at the conference visually showcased Gemini 3.5 Flash's ability to handle complex tasks. In one, Gemini 3.5 Flash was instructed to automatically rename and categorize a messy batch of asset files based on dynamic criteria. This wasn't simple keyword matching; the model needed to read each file's content, understand its actual use, and then archive it according to predefined classification logic. The entire process involved multiple judgment and execution steps, handled in seconds by Gemini 3.5 Flash.

This capability is driven by the upgraded Antigravity platform, powered by multiple collaborative sub-agents processing in parallel. Previously, this type of work might have taken days for a developer to script or weeks for an auditor to manually organize.

Another demo showcased Gemini 3.5 Flash's multimodal generation capabilities. In AI Studio, researchers uploaded an academic paper, and after reading and understanding it, the model directly generated an interactive animation explaining the core concepts.

Charts were no longer static; viewers could drag parameters and switch perspectives to observe dynamic relationships within the data. This direct conversion from text to interactive visual content is powered by Gemini 3's underlying multimodal foundation.

Search demos were equally impressive. In one example, a user entered a query about Gyroid patterns. Leveraging Gemini 3.5 Flash's enhanced agentic coding ability, the search result transformed from the traditional ten blue links into an interactive visualization page.

Users can rotate 3D structures and view different cross-sections on the page, all without navigating away. Liz Reid, who heads Google's search business, declared this new search box "the biggest upgrade since the debut of our iconic search box."

03 $190 Billion in Capital Expenditure and a Model That Saves Enterprises $1 Billion

Building on these capabilities, for companies investing heavily in AI infrastructure, Gemini 3.5 Flash might deliver the most immediate impact. Pichai noted many companies have already exhausted their annual token budgets, "and it's only just past May." He positioned Gemini 3.5 Flash as a "financial lifeline" for enterprises struggling with runaway AI deployment costs at scale.

Agentic workflows are particularly token-hungry. Google's model APIs handle roughly 190 billion tokens per minute, and its own products process over 3.2 quadrillion tokens monthly—a sevenfold increase in nearly a year. Compare that to two years ago at I/O, when that figure was just 9.7 trillion per month.

Against this backdrop, Gemini 3.5 Flash is priced at less than half competing frontier models. Pichai crunched the numbers: For top clients processing around 1 trillion tokens daily on Google Cloud, shifting 80% of that workload to a combination of Flash and the frontier model could result in annual savings exceeding $1 billion. That's a figure large enough to reshape enterprise purchasing decisions and ROI calculations.

The foundation of Gemini 3.5's cost advantage is Google's infrastructure investment. Pichai revealed that Google's capital expenditure for 2026 is projected to be between $180 billion and $190 billion—roughly six times the $31 billion spent four years ago.

A significant investment area is custom chips. The 8th generation TPU debuts a dual-chip architecture, designed separately for training (TPU v8o) and inference (TPU v8i). The inference-optimized TPU v8i allows Google to run models at lower costs than competitors dependent on general-purpose GPUs, with the savings passed on to customers. Pichai noted, "This means larger, more capable models can be trained in weeks, not months."

04 Gemini Spark: Your Private AI Steward

Once a model is fast and cheap enough, it can evolve from passively answering questions to proactively performing tasks as an agent. To this end, Google introduced Gemini Spark.

Josh Woodward, VP of Google Labs and Gemini app, explained that Gemini Spark is an AI running 24/7 on a dedicated Google Cloud virtual machine. Even if you turn off your device, it continues working in the background. Gemini Spark deeply integrates with Gmail, Docs, Sheets, and Slides.

Woodward described the experience: "When you use it, it almost feels like you throw stuff over your shoulder, and Spark catches it and gets to work." Regarding specific capabilities, Woodward shared a few tester use cases: planning parties, tracking school schedules, and monitoring inboxes for issues.

For safety, Gemini Spark requires explicit user approval before executing high-risk actions. Google introduced an Agent Payment Protocol for spending, allowing strict usage boundaries: approve which brands, set spending caps, and limit which merchants. Google plans to expand connectivity this summer, enabling Gemini Spark to operate more third-party apps and websites via the Chrome browser.

A batch of trusted testers got access this week. Next week, Gemini Spark rolls out in beta to Google AI Ultra subscribers in the US. AI Ultra is a new subscription tier launched simultaneously, priced at $100/month, targeting developers, tech leads, and advanced creators, offering priority Antigravity access, higher usage limits, and bundled Omni Flash access.

Gemini Spark is part of Google's broader consumer-focused strategy. Look at the user base first: Gemini app monthly active users have surged from 400 million a year ago to over 900 million. The "AI Mode" in Search hit 1 billion monthly active users just one year after launch, with queries doubling every quarter.

Simultaneously, Google rolled out two new services: an around-the-clock web monitoring intelligence agent that proactively tracks and alerts you to price changes, stock movements, or trending topics of interest; and an AI-powered universal shopping cart based on Google Wallet, supporting unified management and checkout across different e-commerce sites, eliminating the hassle of separate logins and payments.

05 Gemini Omni: A New Species

Alongside Gemini 3.5 Flash and Gemini Spark, Google also unveiled Gemini Omni—its first truly native multimodal model.

Kavukcuoglu specifically differentiated it from the existing video generation model Veo: "Veo is a text-to-video model, whereas Gemini Omni is a true multimodal-input, multimodal-output model." Gemini Omni accepts any combination of text, images, audio, and video as input and generates outputs in the same modalities. All processing occurs within a single unified model, rather than a patchwork of multiple systems.

Users can progressively edit and generate video through conversation; each instruction builds on the last, with the video evolving coherently as the dialogue progresses. Google executives' demos showed specific editing scenarios:

A user uploads an outdoor cycling video and issues the command "change the background to a snowy landscape." Gemini Omni replaces the entire environment while preserving the motion path of the cyclist and bike. Then the user says, "Change to a side-tracking camera angle," and the camera perspective adjusts correspondingly. Finally, the user requests, "Add a voiceover explaining this route," and the model generates the soundtrack and narration. The entire process is completed within a single conversation thread—no file exports, tool switching, or re-uploading needed.

Kavukcuoglu described broader applications: "You can imagine this building capabilities very much like tutorials when you're exploring something." Google particularly emphasized improvements in physics effects—gravity, kinetics, fluid dynamics—the details that determine whether video looks like real footage or AI-generated.

With OpenAI having dropped its video generation tool Sora earlier this year to free up compute resources, Google's launch of Gemini Omni at this moment serves as a public demonstration of its infrastructure strength. Kavukcuoglu also revealed the team once had an agent build a functional operating system from scratch (name undisclosed) to test the boundaries of Gemini 3.5 Flash's capabilities.

For content security, all Gemini Omni-generated content carries a Google SynthID digital watermark, and C2PA content credentials are expanding. An AI content detection API was launched on the Antigravity platform. Google announced that OpenAI, Kakao, and ElevenLabs will also adopt SynthID. For enterprises with strict compliance requirements, this toolkit provides auditable provenance trails.

Gemini Omni is available to US Gemini paid subscribers as of today and will roll out to developers via Vertex AI API in the coming weeks. Google also launched a "personal avatar" program, allowing creators to record short videos and authorize the use of their voice and likeness in generated content. Google employees posting I/O-related content that day demonstrated their own AI-generated portraits.

06 Antigravity 2.0: A Platform for Developing and Managing Autonomous AI Agent Teams

Models need a platform to run on, so Google simultaneously released Antigravity 2.0. Just six months ago, it was merely a coding environment; it has now been transformed into a "platform for developing and managing autonomous AI agent teams."

Kavukcuoglu noted the team "developed Gemini 3.5 Flash together with our agent development platform, Google Antigravity." Flash's speed, tool usage, long-context reasoning, and code generation capabilities are specifically optimized for developer workloads on the platform.

Antigravity appears as a standalone desktop application, also offering a command-line interface and SDKs. Developers can orchestrate multiple agents simultaneously: one writes website code, another generates brand assets, a third plans product architecture. These agents work in parallel, centrally managed.

Also introduced were Managed Agents and CodeMender. Managed Agents can be launched in isolated Linux environments with a single API call to perform reasoning, use tools, and execute code. CodeMender is a security agent that leverages Gemini's advanced reasoning to automatically discover and patch critical code vulnerabilities. Kavukcuoglu argued that as agentic systems write more and more code, this capability becomes essential.

Underpinning all this is a data flywheel. In March, developers were processing roughly 0.5 trillion tokens per day on Antigravity. By mid-May, that figure had skyrocketed to over 3 trillion—a sixfold increase in about ten weeks. Pichai said usage is doubling "nearly every couple of weeks."

The flywheel logic is clear: the more engineers use it, the more real-world signals the model team collects; these signals feed back to improve the model, making it more useful, which in turn drives more usage. Pichai called it "a powerful feedback loop that allows us to continuously improve the 3.5 family models." Google's focus has been on "boosting model intelligence and ensuring everything—tool usage, instruction following, long-horizon tasks, agentic decoding—works flawlessly."

07 Iterating Every Six Months

Gemini 3.5 Flash is just the start. Kavukcuoglu indicated that Gemini 3.5 Pro is in internal testing and will launch next month. He also clarified the update cadence for Google's major models: roughly every six months. Reflecting back, Gemini 3 launched in November last year, Gemini 3.5 in May this year—this rhythm is stabilizing. When asked how version numbering was decided, he explained, "What dictates the version bump is really the progress we see in research and how that manifests in the models and the impact it brings."

For buyers, this predictable, rapid iteration cycle changes planning. A model that can outperform its previous flagship at one-third the cost every six months means that a token budget that feels tight today might feel generous by year-end. Enterprises can no longer assess cost-performance with a static lens when building their technical roadmaps.

Of course, the $1 billion figure is still a PowerPoint projection. Legacy systems, compliance requirements, organizational inertia—these buzzwords surface in every tech iteration cycle and often cause on-paper cost curves to "break" during implementation. However, Google is signaling that, with an internal usage volume of 3 trillion tokens daily—doubling every few weeks—it is personally stress-testing this bet at a scale no customer has yet attempted.

What new face will AI wear one year from now?