Google AI Conquers 6 World-Class Problems, More Shocking Than an IMO Gold Medal! Terence Tao Points the Way to a New Game

Google DeepMind's latest AI agent Aletheia independently conquered 6 world-class math problems in the FirstProof challenge, achieving a qualitative leap from competition level to PhD research level. The "manual era" of human mathematical research may be entering a countdown.

Just now, the final defense line of the human mathematical world has announced a total collapse!

Even the onlookers were shocked: AI not only knows how to solve problems, but it can now independently handle PhD-level pure mathematics research.

In the past couple of days, Google DeepMind's latest AI research agent, Aletheia, knocked down 6 out of 10 recognized world-class unsolved math problems in one go at a summit challenge in the math world called "FirstProof"!

DeepMind executive Thang Luong posted on X, unable to hide his excitement:

"To me, this is even more significant than last year's historic achievement of winning an IMO gold medal!"

This is no ordinary math competition. You have to know, even the world's top mathematicians find these problems extremely tricky.

As a result, Aletheia not only calculated the answers independently, but Jim Fowler, the mathematician who proposed the conjecture for Problem 7, even personally came out to verify:

"The AI's problem-solving process is completely correct."

Even Terence Tao, the world's most outstanding genius mathematician today, stated in a recent interview: AI has already become my "junior collaborator."

Aletheia's "God's Move": Brute Force Deduction

How powerful is Aletheia, really?

Let's see what Thang Luong, Chief Scientist and Research Director at Google DeepMind and head of the Super Reasoning team, has to say:

"Super excited! Our mathematical research AI agent #Aletheia just fully and independently solved 6 out of 10 notoriously difficult FirstProof challenge problems, directly taking the inaugural Best in Show!"

Savor the weight of this sentence.

Luong is blunt:

"In my view, this carries much more weight than our historic moment last year reaching the IMO (International Mathematical Olympiad) gold medal level!"

Because these problems are "super hard bones" that even the top math bosses in the world today find extremely headache-inducing.

This time, DeepMind ran two versions of Aletheia built on Gemini 3 DeepThink (differing only in the underlying model).

After cross-"consultation" by a majority of experts, they joined forces to take down 6 out of the 10 problems (specifically problems 2, 5, 7, 8, 9, and 10).

You have to know, the grading and evaluation of this problem set is hellishly difficult.

Because the experts in this world who can understand even these few problems are already as rare as phoenix feathers.

But it is precisely for this reason that DeepMind's research process was rigorous to the point of paranoia:

The entire solution process was run purely by the machine, with "zero human intervention" throughout, and submitted entirely within the deadline set by FirstProof.

This is a milestone moment.

It is no longer humans feeding equations step by step, but the AI agent has learned to "gnaw" on an extremely complex research problem for a long time, bumping into thousands of dead ends, and finally coming back to report lightly to humans: "I got it (or I messed it up)."

DeepMind even fully visualized the computing power (reasoning cost) Aletheia burned in this process—

The most explosive part was undoubtedly the amazing comeback of Problem 7 (P7).

This is an atypical problem that no one has been able to solve for several years.

According to Tony Feng, an expert in the field, in this competition, apart from Aletheia, no AI could even get close to the correct answer.

When it first started running, even the DeepMind team felt Aletheia had no chance this time, but it actually ran out the correct answer!

To conquer P7, Aletheia invested massive computing power—exactly 16 times that used to solve the Erdős-1051 problem!

Sang Hyun Kim, an authority in the math world, gave extremely high praise after reviewing the AI's solution steps:

"This is the first time in my life seeing AI flawlessly connect and apply several extremely profound mathematical theorems. This is definitely a unique and rare case!"

Here is the full interpretation and experimental details of DeepMind on FirstProof:

Paper Address: https://arxiv.org/abs/2602.21201

Not Talking Nonsense is the Toughest Confidence of AI

If you dig deep into DeepMind's paper, you will find that the fundamental reason Aletheia is so stable is that it has mastered a key skill: "Self-filtering".

Traditional AI large models have a bad habit of pretending to understand when they don't (hallucinations).

No matter what you ask, it will solemnly make up an answer for you.

But in high-end research-level games, if you throw a pile of waste that looks extremely reasonable but doesn't hold water to mathematicians, it's better than giving nothing.

How did DeepMind solve this problem?

They designed two "sub-personalities" inside Aletheia:

One is the "Generator", responsible for opening the mind and wildly conjecturing solution paths; the other is a cold-blooded "Verifier", responsible for picking faults for the "Generator".

Inside the black box of problem solving, these two subsystems fight frantically.

When encountering the 4 problems it couldn't solve, Aletheia didn't choose to force a lie to get by, but directly issued a "No solution found" to humans, or just shut up when the time limit was reached.

Not making things up, never blindly wasting human experts' energy where unsure—this is exactly where top scholars trust Aletheia the most.

As written in the paper: "To improve accuracy, we prefer to sacrifice its ability to solve certain problems."

In terms of problem-solving cost, except for P7, the "divine problem" that consumed 16 times the computing power, the "brain power" consumed to solve the other few problems also far exceeded the extreme value of solving the Erdős-1051 problem last year.

To see the complete interaction logs and problem-solving process (right and wrong, original and fully public), click here directly:

GitHub Address:

https://github.com/google-deepmind/superhuman/tree/main/aletheia

Which "Abnormal Problems" Did Aletheia Tear Apart?

Let's first look at the specifically mentioned P7.

Problem Background: Algebraic Topology/Differential Geometry. Determine whether a uniform lattice of a semisimple Lie group containing elements of order 2 torsion can be the fundamental group of a compact boundaryless manifold whose universal covering is acyclic in rational homology.

Answer: Impossible.

AI God-tier Solution:

Proof Idea 1: Pure Topological Method (Lefschetz number contradiction)

Using the condition that the universal covering is Q-acyclic, calculate that the compactly supported Lefschetz number of the order 2 element γ must be non-zero; but γ acts freely (no fixed points), which through the multiplicativity of the Euler characteristic implies the Lefschetz number must be zero. 0 = ±1, contradiction.

Proof Idea 2: Geometric Method (Rigidity of symmetric spaces)

Using the geometric structure of the lattice, construct an equivariant map from the universal covering to the symmetric space, proving that the Lefschetz numbers of γ on both sides must be equal. But on the universal covering side it is zero (free action), and on the symmetric space side it is non-zero (Cartan Fixed Point Theorem guarantees a fixed point). Contradiction again.

What's good about it?

Proof 1 is good because of "simplicity". The problem gave a bunch of conditions, but none were used. It solved the problem relying only on the most basic topological tools, and actually proved a stronger conclusion: any discrete group with torsion won't do. The chain is extremely short: calculate the Lefschetz number, one side is non-zero and the other is zero, contradiction, end.

Proof 2 is good because of "depth". It used all the geometric conditions given in the problem, constructed the map from the universal covering to the symmetric space, and finally used the Cartan Fixed Point Theorem on the symmetric space to find a contradiction. This path is longer, but answers a more essential question.

Problem Background: Number Theory/Representation Theory. In representations of matrix groups over non-Archimedean local fields, prove the existence of a universal Whittaker function such that the local Rankin–Selberg integral is non-zero for all paired representations.

Answer: Yes. Such a "universal" W exists.

AI God-tier Solution:

First choose a special Whittaker function W so that the integral domain is compressed onto a compact set, the complex parameter s disappears completely, and the problem simplifies to proving that a finite functional is non-zero. Then use proof by contradiction: assume it is zero for all V, through finite Fourier analysis deduce that the test function has "translation invariance", which would force the representation π to have an invariant vector under a subgroup coarser than its conductor, contradicting the definition of the conductor.

What's good about it?

The most critical part of the whole proof is the first step of choosing the Whittaker function W. This one choice achieved three things at once: 1) compressed the integral domain onto a compact set, 2) eliminated the complex parameter s, and 3) turned an infinite-dimensional analytical problem into a finite-dimensional algebraic problem. Moreover, this W does not depend on the paired representation π—the same choice works for all π, which is extremely rare in representation theory.

The "level lowering" part of the proof by contradiction is also wonderful: assuming the functional is identically zero, through finite Fourier analysis it is deduced step by step that the test function is invariant modulo p^{c-1}, but the conductor of π is exactly p^c, and there cannot be an invariant vector at this level. The contradiction lands exactly on the definition of the conductor, not one step more, not one step less.

For other problems, interested readers can consult the paper and GitHub project themselves.

Humans Can No Longer Keep Up with the Speed of Problem Creation

Why is it mathematics specifically that has become the ultimate arena for testing AI's strength?

The reason is simple—math answers are black and white, right is right and wrong is wrong, there is no room for humans to "show mercy" and give "sympathy points".

But the problem now is: the speed of composing the test paper has been rubbed on the ground by the speed of filling it out.

In November 2024, Epoch AI launched the FrontierMath benchmark, specifically designed to touch the bottom of the most cutting-edge AI's mathematical reasoning ability.

When it first launched, the strongest AI couldn't even do 2% of the problems. As of today, GPT-5.2 and Claude Opus 4.6 can already handle more than 40% of the basic problem bank, and even for the 50 ultimate difficulty Level 4 challenge problems, the accuracy rate has broken through 30%.

However, no matter how hard FrontierMath is, it is essentially "humans have standard answers, let's see if AI can do it too", strictly speaking, it is still an exam.

But the 10 problems in FirstProof were dug out from their own real research by 11 top mathematicians, difficult problems that had never been published before.

Project Homepage: https://1stproof.org/

And the ending of this challenge was full of drama.

After the problems were released on February 6, professional scholars, folk masters, and major AI laboratories all entered the arena.
When the answers were revealed on February 14, no person or team took them all.
Subsequently, the problem setters themselves ran a round with Gemini 3.0 Deep Think and ChatGPT 5.2 Pro, and only solved 2.
Finally, OpenAI's strongest internal system solved 5 under limited human supervision.

In contrast, this is enough to show how high the value of Aletheia's 6 problems with "zero human" intervention is.

The math circle has mixed feelings about this: some shouted it was heaven-defying, while others felt that there were still 4 unsolved out of 10, so it is still far from replacing mathematicians.

But an irreversible trend is already in front of everyone—

We need more difficult problem banks to test AI, and we must act fast, because everything that exists is expiring at a visible speed.

Epoch AI obviously realized this too.

Just as FirstProof started, they released their big move—FrontierMath: Open Problems.

Swipe left and right to view

This brand-new problem bank includes 16 real unsolved mysteries that professional mathematicians have struggled with but have so far been wiped out.

Even better, although there is no standard answer, Epoch AI wrote automatic grading procedures for each problem to judge whether the AI's solution is valid.

Since its launch, no AI has solved even a single one—this "zero score" status actually proves the value of the problem bank.

The FirstProof team doesn't intend to stop either, having officially announced a second round of challenges with even more abnormal difficulty launching on March 14.

Terence Tao: AI is My "Junior Collaborator"

So, how do the people standing at the absolute peak of the math world view this storm?

In a recent interview, Terence Tao gave an extremely precise positioning: AI is now his "junior collaborator".

In 2023, he predicted that AI would reach the level of a paper co-author by 2026. At that time, there were mixed reviews, but looking at the progress now, it completely matches and is even slightly ahead of schedule.

But more important than this title is the new paradigm of mathematical research described by Terence Tao.

He said traditional math research is like "case studies", where a paper grinds on one or two problems to death. This has been the way mathematicians have worked for centuries. But AI is giving mathematicians the ability to do "large-sample surveys" for the first time.

At the same time, there are a large number of extremely tedious calculations in mathematical research that humans hate to do, so mathematicians will rack their brains to find clever ways to bypass them. But AI doesn't mind the trouble; it is willing to run through these boring deductions tirelessly.

When AI is integrated into human workflows, these obstacles that once made people daunted are directly crossed.

In another dimension, AI also shows a unique skill—it can systematically scan the long tail of problems that humans simply don't have the energy to touch.

Take the more than 1,000 math problems left by Erdős as an example. AI can go through them from beginning to end, pick out solvable problems, and break them one by one.

Humans can't do this, but AI can, and is already doing it.

Terence Tao even admitted that he learned things from the AI's problem-solving process:

Maybe it used a little trick from a 1960 paper I haven't seen; it can do things that human experts look at and are too lazy to try.

The Next Countdown Has Already Begun

Looking back at this whole storm, a clear main line has surfaced:

From FrontierMath being quickly cleared, to Aletheia taking 6 problems with zero human intervention on FirstProof, to Terence Tao personally admitting that AI is already his "junior collaborator".

All signals are pointing to the same fact:

AI is embedding itself into the core workflow of human mathematical research in an irreversible posture.

And what is most intriguing is Epoch AI's Open Problems problem bank which still has a "zero score".

Its existence is a metaphor in itself:

The last weapon humans can use to test AI now is problems whose answers they don't even know.

How long can this defensive line hold? No one dares to guarantee.

But one thing is almost certain—

At the moment when the FirstProof second round challenge opens on March 14, all the numbers in this article today may already be outdated.

References:

https://x.com/rohanpaul_ai/status/2026559039241597070?s=20

https://www.theatlantic.com/technology/2026/02/ai-math-terrance-tao/686107/

Google AI Conquers 6 World-Class Problems, More Shocking Than an IMO Gold Medal! Terence Tao Points the Way to a New Game

Related Articles

分享網址