Math Major, in Crisis! Fields Medalist Tests ChatGPT 5.5 Pro, Achieves Thesis-Level Result in 17 Minutes

Jay Wenle, from Aofei Temple Quantum位 | Official Account QbitAI

"If AI's mathematical ability continues to develop at the current pace, we (mathematical researchers) will soon face a crisis."

Timothy Gowers, a recipient of math's highest honor, the Fields Medal, sounded a red alert for students after his latest experience with ChatGPT 5.5 Pro.

"The impact on doctoral students is particularly urgent."

Here's what happened: this Cambridge math luminary recently got his hands on the exclusive "priority pass" for ChatGPT 5.5 Pro.

With his new toy, Gowers casually tossed a few open problems in additive number theory at the AI, just to see what would happen.

But what followed completely exceeded his expectations.

In under two hours, the little GPT independently completed a mathematical result that he considers "fully worthy of a PhD thesis."

Throughout the entire process, Professor Gowers provided no mathematical guidance whatsoever.

The only things he needed to do were:

"Hmm, that's a nice idea. Why don't you try developing it?"

"Great, can you write that up in LaTeX preprint format for me?"

In that moment, Gowers truly felt the suffocating anxiety of today's young generation—

When AI can independently tackle problems of this caliber, what future is there for young mathematicians pursuing their doctorates?

Even he couldn't offer a clear answer.

The only thing to do is to find a new path for students as quickly as possible.

Before AGI truly arrives, we must rediscover the real value of learning mathematics and pivot swiftly.

"Mathematics departments that bear a responsibility to their students should urgently prepare for this."

But don't panic just yet, because another Fields Medalist—Terence Tao—has a lot to say to everyone.

After all, he's a true pioneer at the intersection of AI and mathematics, and recently co-founded an AI4S organization precisely to help young people find new directions in the AI era.

Coincidentally, Tao has just shared his latest insight:

The "digestion" of mathematical proofs is where the most irreplaceable value of human mathematicians lies in the age of AI.

Two of the world's top mathematicians, facing the same storm, offer different angles of reflection.

Compared to Tao, however, Gowers's reaction this time might be more striking.

Tao, being an "AI veteran" (haha), is relatively calm about it.

Gowers, on the other hand, was truly "shaken" (not really) and promptly fired off a super long essay.

It really is very, very long...

Below is a curated version, easier for everyone to read.

Enjoy.

A Fields Medalist's Math Experiment with ChatGPT 5.5 Pro

The starting point of the story was actually a rather interesting paper.

Additive number theory expert Mel Nathanson wrote a paper listing a bunch of open problems concerning the sumset properties of integer sets.

These types of problems have a clear direction, moderate difficulty, and abundant quantity—prime material originally intended for new doctoral students to practice on and charge toward their first top-tier journal publication.

Instead, Gowers used them to test ChatGPT 5.5 Pro.

He threw the AI a problem something like this:

Given an integer set A, knowing it has k elements (|A|=k), and knowing the size of its sumset (the set of all pairwise sums of elements, denoted 2A), what is the minimum possible diameter of A?

Nathanson himself had already proven an exponential upper bound (2^k-1), but always suspected it could be improved.

ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds.

Then it produced a construction with a quadratic upper bound, and it was theoretically optimal.

Its core idea was to use Sidon sets (special sets where all pairwise sums of distinct elements are unique, maximizing sumset size) and arithmetic progressions for combinatorial construction.

To put it simply, it's like building with blocks. The AI selected two special types of blocks.

One is called a Sidon set, where the sum of any two different elements is unique, maximizing the sumset size.

The other is the arithmetic progression we all learned in school. By cleverly combining these two block types, it built a set with the minimum diameter satisfying the conditions.

Nathanson's original proof used induction, essentially doing a similar block-combining task, but using a less efficient Sidon set based on powers of two.

But just like using large blocks to build a small house inevitably wastes space, this yielded an exponentially large diameter.

ChatGPT 5.5 Pro simply swapped in a known, more efficient type of Sidon set.

The diameter of this set is quadratic (roughly on the order of k² for k elements), which is orders of magnitude smaller than exponential (2ᵏ). It's akin to using refined small blocks to precisely build a house, maximizing space utilization perfectly.

Some might say, isn't that just re-shuffling existing mathematical tools?

And you'd be right.

But Gowers himself admitted that a considerable amount of human mathematical research essentially involves combining existing knowledge and proof techniques.

The key point is, Nathanson himself didn't think of this step, but ChatGPT did.

Gowers then asked a related, upgraded question—

Replace the sumset with the restricted sumset, meaning elements added pairwise cannot be the same element, with all other conditions unchanged. Can you still find the minimum diameter?

This problem was unsurprisingly solved as well.

Then he asked ChatGPT to merge the two results into an academic note, and 47 minutes later, a standard LaTeX preprint was ready.

Then things got even more interesting. Gowers upped the difficulty, asking about the diameter problem for a k-fold sumset in a general setting.

This problem is much harder because for general k, we don't even fully know which sumset sizes are realizable; the basic constructive framework is missing.

However, MIT student Isaac Rajagopal had already done pioneering work, proving the exponential dependency relationship for the diameter of an h-fold sumset.

Gowers wanted to see if GPT 5.5 Pro could improve on Isaac's work. Unexpectedly, the AI performed a double leap operation and even invented the k-dissociated set construction.

Here's what happened next, organized by timeline:

Round 1: ChatGPT thought for 16 minutes and 41 seconds, and based on the innovative idea of dissociated sets, improved the upper bound from exponential to sub-exponential.

Round 2: Gowers asked it to write a preprint, which took 47 minutes and 39 seconds.

Round 3: Isaac himself reviewed it and thought the argument looked correct. The logical reasoning was rigorous, and the use of k-dissociated sets was clever.

Round 4: Gowers got greedy and asked ChatGPT whether it could be pushed further to a polynomial bound.

Round 5: ChatGPT thought for 13 minutes and 33 seconds, proposing that fine-tuning the k-dissociated set could do it, but a few technical details needed verification.

Round 6: Gowers asked it to verify on its own. It resolved the core sticking points 9 minutes and 12 seconds later.

Round 7: Writing up the preprint took 31 minutes and 40 seconds.

Round 8: Isaac reviewed it again, judging the conclusion to be basically valid. He specifically pointed out that it wasn't just line-by-line correct, but also correct at the conceptual level—meaning ChatGPT indeed contributed new ideas.

Throughout the whole process, Gowers's mathematical input was zero.

All he did was act as a project manager (the math version)—

Posing requirements, confirming direction, demanding delivery.

The mathematics itself was all done by ChatGPT.

AI Raises the Entry Barrier for Math Doctoral Students

If this were just a cool demo, it would be one thing.

But what Gowers saw were two looming crises.

First, a very practical question: what should be done with this AI-generated result?

If a human mathematician had done this, it would definitely be publishable.

But now the primary work was done by AI—

arXiv has explicitly rejected AI-generated content, and traditional journals certainly won't accept it.

So where should it go?

Gowers himself proposed an idea: perhaps we should build a dedicated repository for AI mathematical results, with some review process.

For instance, requiring a human mathematician to confirm correctness, or verification via a formal proof assistant, while also ensuring the review process itself doesn't become a huge workload.

Frankly, there's no answer to this right now, so this result currently sits on Gowers's blog, existing only as a link.

Beyond the attribution problem, here lies what truly gives Gowers anxiety—

The foundation of the mathematical training system has been pulled out from under it.

The classic path for training doctoral students in research is to give a newcomer an open problem of moderate difficulty as an entry point.

The problems in Nathanson's paper were originally perfect fodder for this.

But now, ChatGPT 5.5 Pro solved them in two hours.

This directly raises the barrier to entry, because before you only needed to prove something no one had proven; now you need to prove something that AI also can't prove.

Gowers wasn't entirely pessimistic, though. He offered two buffer zones.

One is that doctoral students can also use AI.

In the future, the research threshold might not be about tackling "problems AI can't solve" alone, but rather, under human-AI collaboration, producing results that AI cannot achieve independently.

Gowers himself has been doing a lot of this type of human-AI collaborative math research lately. He says AI can indeed provide useful contributions, but hasn't yet reached the point of independently generating game-changing ideas.

The other is that AI breaks through most easily in combinatorics.

Because combinatorics is essentially reverse reasoning starting from a problem, whereas other branches of mathematics are more often forward exploration starting from an idea.

The latter requires judging what observation is interesting and what direction is worth pursuing—this aesthetic judgment is probably harder for AI and currently remains a human advantage.

But he also specifically emphasized that everything above only applies to current AI. Large models are iterating too fast, and today's judgment might be obsolete in months.

And he added a poignant jab:

“If someone does mathematics with the goal of having their name permanently etched onto a theorem or definition, pursuing 'immortal nomenclature,' that era's dividend might soon vanish completely, for everyone.”

Gowers used a thought experiment to get to the core of the issue:

Suppose a mathematician solves a major problem through extended dialogue with an AI. The mathematician provided guidance, but the main ideas and all technical work were done by the AI. Would we consider this a major achievement for the mathematician?

Gowers's answer is: No.

In that case, what is the point of learning math in the AI era?

Gowers says, just like an excellent programmer is better at vibe coding than an ordinary person, a mathematician who has truly done research will be better at collaborating with AI. Because the deeper your understanding of the problem-solving process itself, the stronger your ability to use AI.

Mathematics itself is a highly transferable foundational thinking ability. Future math researchers may lose the academic honor of exclusive theorem naming, but the accumulated intellectual foundation will be the best personal asset in the AI era.

Terence Tao's Three-Layer Pyramid

Actually, Terence Tao saw the impact of AI on mathematical research very, very early on.

Today, he proposed a "pyramid" that breaks down the solving of a mathematical problem into three components:

Proof Generation: Constructing a complete proof.

Proof Verification: Confirming that a proof is correct.

Proof Digestion: Truly understanding what this proof says, why it's right, and what deeper structure it reveals.

The first two are being automated by AI at an astonishing rate.

But the third—digestion—is far from solved.

This will trigger an unprecedented "cognitive overload":

Proofs are being generated in vast quantities, practically for free, and a machine can even verify them for you, yet no one has truly digested them.

Tao calls this "proof indigestion."

To this, some might suggest:

Then let's automate the third step too. Train AI to present proofs in a better mathematical writing style to make them easier to understand.

But what Tao means is that blindly optimizing for "readability" metrics might actually make the final product worse.

He uses a culinary analogy.

We chew food to aid digestion. Cooking techniques can make food more tender, reducing the need for chewing.

But if you decide to thoroughly optimize the digestion process by minimizing the "amount of chewing needed," the logical optimum is to—throw all the food into a blender, and pipe it directly into your stomach through a tube.

This technically solves the digestion problem. But no one wants to eat like that. It would cause major problems for both body and mind.

The value of eating has never been just nutrient intake.

The sensory experience, the social context, and the satisfaction from the act of chewing itself... these by-products are what humans enjoy most.

Optimize away all friction, and you don't get a better diet; you get a feeding tube.

The same goes for mathematics.

We must distinguish what constitutes essential friction in math learning.

Some "difficulty" in proofs is artificially created.

Unclear phrasing, messy structure... these "artificial difficulties" can indeed be eliminated by reading papers with AI, just like marinating a piece of meat before serving it.

But there is another kind, which is "natural difficulty."

It is supposed to be hard.

The reader needs to "chew" on it to gain true understanding, and in that process, new inspirations can spark.

This is like what Tao previously said on a podcast: he deliberately blocks out time in his schedule for "serendipitous encounters."

Seeing this, some might still say: let AI solve everything, keep optimizing the evaluation criteria, and just factor in the "natural difficulty" as well, right?

But in fact, not every problem can be treated as an "optimization problem"—where iterating infinitely will surely yield the result we want.

This isn't how humans approach food.

Handcrafted dishes by a Michelin-starred chef are still treasured more than machine-processed food, even if the latter is safe, aesthetically pleasing, easy to digest, convenient, and tastes decent.

It's not that processed food isn't useful.

It's just that no one would seriously propose replacing the culinary arts of humanity entirely with it.

This is called "the breath of the wok," something only humans can bestow.

Don't Fall Into the Blender

Two Fields Medalists, facing the same storm, see different things.

Gowers sees a crisis.

Those "entry-level tracks" originally prepared for young mathematicians are being flattened by AI. The foundation of the training system is shaking, the rules of academic publishing are failing.

Where is the path for newcomers?

To this, Terence Tao doesn't really have an answer either. What he provides is a boundary.

AI can generate proofs and verify proofs, but "digestion," at least for now, is uniquely human.

It's not that AI can't do it, but...

We cannot hand it over.

This isn't simply a knowledge-based task. The act of "digestion" touches upon intelligence itself.

This truly is an era belonging to "meaning."

AI is gradually pushing us into a corner, endlessly pressing the same question:

What, exactly, is uniquely human and most precious?

In the field of mathematics, that thing might be the beneficial "natural difficulty" that Terence Tao speaks of.

That knowledge you must chew, struggle, and explore yourself before it truly becomes a part of you.

Perhaps the same holds true for other fields.

The blender can pulverize everything.

But some things will always need to be done in person.

Do not become a human battery plugged into a tube, like in The Matrix.

Reference links: https://gowers.wordpress.com/2026/05/08/a-recent-experience-with-chatgpt-5-5-pro/ https://x.com/wtgowers/status/2052830948685676605 https://mathstodon.xyz/@tao/116551624228986501

Math Major, in Crisis! Fields Medalist Tests ChatGPT 5.5 Pro, Achieves Thesis-Level Result in 17 Minutes

A Fields Medalist's Math Experiment with ChatGPT 5.5 Pro

AI Raises the Entry Barrier for Math Doctoral Students

Terence Tao's Three-Layer Pyramid

Don't Fall Into the Blender

Related Articles

分享網址