New Zhiyuan Report
Editors: Peach Good Sleep
[New Zhiyuan Digest] Today, Google DeepMind's "AI Mathematician" Aletheia goes on a rampage, solving mathematical conjectures and writing papers independently. Even more astonishingly, the gold-medal-winning Gemini sweeps through 18 core research challenges.
The next Nobel Prize winner might just be pre-booked by Gemini!
Google DeepMind has once again dropped a bombshell on the global research community, releasing two heavyweight papers in one go——
Gemini Deep Think becomes the "Research Partner," breaking through research-level problems in mathematics, physics, and computer science.
Previously, AI winning gold medals in IMO and ICPC international competitions was already impressive....
This time, Gemini goes all out and truly engages in research!
Google has built an "AI Mathematician" based on Gemini, codenamed Aletheia. It has achieved multiple research milestones on PhD-level problems.
These include independently writing and publishing academic geometry papers, and completing a systematic evaluation of 700 open problems in the "Erdős Conjecture" database.
In the IMO-ProofBench benchmark test, Aletheia leads the pack, achieving a score of 91.9%, smashing the SOTA.
More disruptively, it possesses the most core human skill: self-correction, and will actively acknowledge problems it cannot solve.
The so-called Millennium Prize Problems may not be far from being solved one by one.
Not only that, in physics and computer science, Gemini Deep Think collaborated with experts to conquer 18 long-stalled research challenges.
Covering areas like ending a decade-old submodular optimization conjecture, breaking through discrete algorithm bottlenecks, machine learning and combinatorial optimization, information theory and economics, the remarkable achievements are enough to be recorded in history.
At this very moment, human research workflows are brewing a disruptive transformation.
Gemini's accelerated evolution is violently breaking through in multiple research fields in a nearly "dimensional reduction" manner.
Google's "AI Mathematician" Aletheia Makes Its Grand Debut, Crushing PhD-Level Problems
In the summer of 2025, Gemini Deep Think (Advanced) first won the IMO gold medal, followed by winning the ICPC championship.
Now, Gemini has completely crossed the competition threshold and officially invaded the "deep waters" of human intelligence.
Unlike IMO-level competition difficulty, research-level mathematical problems require invoking "advanced techniques" from a vast sea of literature.
While "Foundation Models" (FMs) are knowledgeable, they lack specialized data, often leading to misunderstandings or even "hallucinations" when handling advanced disciplines.
To this end, Google DeepMind internally built a mathematical research AI agent——Aletheia, backed by the powerful Gemini Deep Think.
Paper link: https://github.com/google-deepmind/superhuman/blob/main/aletheia/Aletheia.pdf
In ancient Greek, Aletheia means "truth."
It achieves iterative "end-to-end" generation, verification, and modification of solutions in natural language.
Specifically, Aletheia comes with a "natural language verifier" that can spot flaws in candidate solutions and achieve an iterative "generate-modify" process.
Most crucially, it can acknowledge when it cannot solve a problem, greatly improving researcher efficiency.
Overview of Aletheia: A Deep Think-driven mathematical research AI agent capable of iterative generation, verification, and correction for research-level mathematical problems.
In summary, the three core technical pillars driving Aletheia are:
- Gemini Deep Think Advanced: Specifically used to tackle extremely difficult reasoning problems;
- Novel inference-time Scaling Law: Its capability span is immense, handling Olympiad-level problems at the top and PhD-level specialized exercises at the bottom;
- Powerful tool-calling ability: Deeply integrated with Google Search and web browsing to conquer long-standing issues in mathematical research, virtually eliminating fabricated references or inaccurate calculations.
Since reaching IMO gold medal level in July 2025, Gemini Deep Think has progressed at a remarkable pace.
With an increase in inference-time compute, its score on the advanced IMO-ProofBench test reached 90%.
Google DeepMind proved that the Scaling Law remains effective even when transitioning from Olympiad level to PhD-level exercises (based on the internal FutureMath Basic benchmark).
Notably, Aletheia achieves higher reasoning quality even with less inference compute.
As of January 2026, the latest advanced version of Deep Think has significantly outperformed the IMO gold medal version (July 2025) on Olympiad-level problems. The inference-time Scaling Law also applies to PhD-level exercises. Aletheia achieves a further leap in reasoning quality with even lower inference-time compute. All results are scored by human experts.
The First Batch of 6 Papers: One Fully AI-Generated, 3 Already Published
In the actual combat of research-level mathematics, Aletheia's capabilities are no joke, having already achieved many remarkable "autonomous breakthroughs."
Among the first six papers completed by Aletheia, they include the following categories——
- Independent completion, 0 human involvement
The paper "Eigenweights for arithmetic Hirzebruch Proportionality" was generated entirely by Aletheia without any human intervention.
It computes certain structural constants known as "eigenweights" in arithmetic geometry.
Paper link: https://arxiv.org/abs/2601.23245
- Human-AI collaboration
The paper "Lower bounds for multivariate independence polynomials and their generalisations" was completed through collaboration between humans and Aletheia, jointly proving bounds for interacting particle systems (called independent sets).
Paper link: https://arxiv.org/abs/2602.02450
- Large-scale semi-autonomous evaluation, conquering Erdős Conjecture problems
The paper "Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems" evaluated 700 open problems in Bloom's "Erdős Conjecture" database and autonomously solved four unsolved mysteries listed therein.
On problem Erdős-1051, the model provided an autonomous solution and propelled generalization results reported in another study "Irrationality of rapidly converging series: a problem of Erdős and Graham."
Paper link: https://arxiv.org/abs/2601.22401
Paper link: https://arxiv.org/abs/2601.21442
Furthermore, Aletheia contributed intermediate propositions in two additional papers as shown below.
Paper link: https://arxiv.org/abs/2601.18557
Paper link: https://arxiv.org/abs/2601.23229
It's worth mentioning that there have been previous works with Gemini exploring research-level mathematics, but the collaboration scale and number of problems solved were relatively small.
Additionally, Google DeepMind established a taxonomy to classify "AI-assisted mathematical research" based on importance and AI contribution level——
In the table below, results already classified as Level 2 (publishable quality) have been submitted to renowned journals.
Currently, Google has not yet achieved any Level 3 (major advancement) or Level 4 (milestone breakthrough) results through Gemini.
A taxonomy table covering all AI-assisted mathematical results in this study. Results listed as Level 2 have been submitted for publication.
Ending a Decade-Old Conjecture, Conquering 18 Major Research Challenges
Beyond its prowess in mathematics, Gemini Deep Think also demonstrates significant potential in computer science and physics.
The paper "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques" builds upon similar agent reasoning ideas and summarizes "secrets" for efficient collaboration, particularly the "Advisor" mode:
Humans guide AI through an iterative "Vibe-Proving" cycle to verify intuition and refine proofs.
Paper link: https://arxiv.org/abs/2602.03837
Additionally, Google details tactical techniques like "balanced prompting"—asking the AI to simultaneously attempt to prove or disprove to prevent confirmation bias—and code-assisted verification.
These methods, combined with the model's ability to connect across different scientific fields through deep structural understanding, are changing how theoretical research is conducted.
This work builds upon the successful deployment of Gemini Deep Think Advanced to assist in reviewing CS theory papers for the STOC’26 conference.
Diagram of AI reasoning process: Shows how network layers perform broad exploration of solution space, then converge into structured reasoning, ultimately confirmed through automated verification and human review.
By collaborating with experts to conquer 18 research challenges, the Gemini Deep Think Advanced version helped break long-standing bottlenecks in algorithms, machine learning, combinatorial optimization, information theory, and economics.
Accepted to ICLR 2026
Highlights from the paper "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques" include:
- Crossing mathematical boundaries to solve network puzzles
Progress on classic computer science problems like "Max-Cut" (efficiently partitioning networks) and "Steiner Tree" (connecting points in high dimensions) had stalled.
Gemini broke these deadlocks by shattering "mental fixations."
It brought in sophisticated tools from completely unrelated branches of continuous mathematics—such as Kirszbraun theorem, measure theory, and Stone-Weierstrass theorem—successfully solving these discrete algorithm puzzles.
- Ending a decade-old conjecture in online submodular optimization
A 2015 theoretical paper posited a seemingly obvious rule for data streams: duplicating a newly arrived item is always less valuable than simply moving the original item. Experts spent a decade trying to prove this.
Gemini designed an extremely tricky "three-item combination counterexample," rigorously proving that this long-held human intuition was wrong.
- Machine learning optimization
Training AI to filter noise often requires engineers to manually tune a mathematical "penalty term."
Researchers invented a new technique that automatically adjusts it but couldn't mathematically explain why it worked.
Gemini analyzed the equations and proved the method's success mechanism: it secretly generated its own "adaptive penalty" during operation.
- Upgrading economic theory for the AI era
A recent "Revelation Principle" for auctioning AI-generated Tokens was mathematically valid only when bids were restricted to rational numbers.
Once the range extended to continuous real numbers, the original proof failed. Gemini used advanced topology and order theory to extend the theorem to accommodate continuous auction dynamics in the real world.
- Cosmic string physics
Calculating gravitational radiation from cosmic strings requires finding analytic solutions to tricky integrals containing "singularities."
Gemini found a novel solution using "Gegenbauer polynomials." This naturally absorbs the singularities, collapsing an infinite series into a closed-form finite sum.
These achievements span fields from information and complexity theory to cryptography and mechanism design, showcasing how AI is fundamentally changing research work.
Given the fluid, conference-oriented publication mechanism in computer science, we describe these results according to academic trajectory rather than a rigid taxonomy.
About half of the above results target top-tier conferences, with one accepted to ICLR ’26, and most of the remainder slated for future journal submissions.
Whether by identifying errors or refuting conjectures to correct field direction, these results highlight AI's value as a high-level scientific collaborator.
Gemini Reshapes Research, the Human "Multiplier" Has Arrived
Building on Google's previous breakthroughs, this work demonstrates that general-purpose foundation models, combined with agent reasoning workflows, can become powerful scientific partners.
Guided by experts including mathematicians, physicists, and computer scientists, the Gemini Deep Think pattern is proving its utility in fields centered on complex mathematics, logic, and reasoning.
We are witnessing a fundamental shift in scientific workflows.
As Gemini evolves, it is becoming a "multiplier" for human intelligence, handling tasks like knowledge retrieval and rigorous verification, allowing scientists to focus on conceptual depth and innovative direction.
Whether it's refining proofs, finding counterexamples, or connecting seemingly unrelated fields, AI is becoming an indispensable collaborator in a new chapter of scientific progress.
References: