Recently, Fields Medalist and UCLA mathematics professor Terence Tao published a hands-on demonstration video on YouTube, approximately 26 minutes long, detailing how to use Anthropic's Claude Code agent tool to complete the full formalization process of a mathematical proof in the Lean theorem prover.
(Source: YouTube)
Tao clarified the objective at the beginning of the video: to transform the "Singleton Law" from set theory, moving from an informal natural language description into code that the Lean system can compile and rigorously verify. In short, the law demonstrates that for any set A and element x, the condition that the singleton set {x} belongs to A is equivalent to certain specific subset properties.
Although this is a relatively elementary lemma in mathematical terms, formalizing it in the rigorously strict type-theory environment of Lean involves a significant amount of tedious coding work with extremely high syntactic demands.
This was not Tao's first attempt at this task. About nine months ago, in the "Equation Theories" project he led, he had already manually completed this proof using mainstream tools available at the time, such as GitHub Copilot.
(Source: YouTube)
By reattempting this problem with Claude Code, Tao aimed to intuitively compare the generational difference between the new "agentic coding tools" and the previous generation of code completion tools.
Unlike early versions of GitHub Copilot, which could only provide a few lines of automatic code completion based on cursor position, Claude Code is an agent system running in a terminal. It can understand complex natural language instructions, autonomously read file directories, plan steps, and automatically execute code editing and modifications. In Tao's view, this leap in capability may allow AI to truly take over the repetitive labor known as "bureaucracy" in mathematical research.
Even Experts Can Stumble When Using AI
Interestingly, the smooth workflow shown in the video was not achieved overnight. Tao admitted during the recording that this was his third attempt to complete the task with Claude Code. He had already "crashed" twice before for different reasons.
In the first attempt, Tao directly gave a high-level instruction, asking Claude to "complete the entire proof." As a result, the AI ran continuously for 45 minutes, consumed a massive amount of tokens, caused his computer to crash, and ultimately failed to produce any valid results.
One commenter directly tagged Anthropic, saying, "Give Terence Tao unlimited token access—maybe the Mathematics 2.0 era will arrive sooner!" While sounding like a joke, this comment hit on a real pain point of current AI tools: when doing fine-grained work, tokens get consumed remarkably fast.
(Source: YouTube)
In the second attempt, he changed his strategy, asking the AI to execute step-by-step by lemma (Lemma 1, 2, 3). This time, it took 25 minutes to complete successfully, but the recording failed to save due to a screen recording software glitch.
Learning from the first attempt, in the third try (the one shown in the published video), Tao adopted a highly structured "scaffolding" strategy. He wrote an extremely detailed "recipe" at the top of the file, breaking down the task into initial definitions, outline setup, and the step-by-step proofs of three sub-lemmas, thereby constraining the AI's divergence.
1. Skeletonization
In the initial phase of the workflow, Tao instructed Claude Code not to rush into derivations. Instead, he had it build a macro-level framework for the entire proof using Lean's "sorry" placeholder. This step proceeded exceptionally smoothly; the AI accurately identified logical breakpoints in the informal proof and converted them into Lean code structures. Tao pointed out that having the AI first write a skeleton with "sorry" placeholders and then filling them in one by one is currently the most efficient mode of human-AI collaboration.
2. Getting Stuck and Human Intervention
However, when filling in the specific proof details for Lemma 1, Claude Code's shortcomings began to show. Because Lean's underlying logic requires extreme rigor, the AI exhibited a tendency to "overthink" when faced with equation substitutions in informal language. It frequently attempted to expand underlying mathematical definitions rather than mechanically following the steps provided by humans.
In the video, the AI performed extensive backtracking and self-correction in the background, consuming significant computational resources and making the derivation process unusually long. During this process, Tao's workstation unexpectedly crashed once. After the system recovered, faced with the dilemma of the AI complicating simple steps, Tao decisively chose to intervene manually. He took over the keyboard and quickly entered a strategy based on the "congr" (congruence/equation substitution) command, instantly breaking the deadlock.
He objectively commented, "Over-relying on tools might cause you to lose your intuition about proofs. When AI gets stuck in a dead end, humans taking over directly is often faster than waiting for it to correct itself."
3. Evolving a "Parallel Workflow"
As the process advanced to Lemma 2 and Lemma 3, Tao demonstrated an innovative workflow. Once he confirmed the AI had mastered the skeleton-building technique, he no longer played the simple role of a "supervisor" but began a "dual-track operation" with the AI. While Claude Code autonomously analyzed and attempted to fill in the underlying logic of Lemma 3 in the background, Tao manually completed the relatively intuitive "sorry" sections of Lemma 2 in the earlier part of the code.
This human-machine parallel operation mode ultimately compressed the total time to under half an hour, and the final code passed the Lean compiler's rigorous scrutiny without any errors. Tao summarized that splitting the task—having humans handle straightforward logic while assigning heavy code-generation tasks to agents—is currently the most viable approach.
From "Mediocre Teaching Assistant" to "Junior Collaborator"
Examining this video in the context of Tao's series of AI experiments in recent years reveals a clear trajectory of technological advancement.
As early as the beginning of this generative AI boom, Tao actively tested various chatbots, comparing them to "mediocre but not entirely incompetent graduate students." At that time, AI was prone to hallucinations when handling tasks like epsilon-delta limit proofs in calculus, frequently confusing variable domains or omitting boundary conditions, existing more as novelty toys.
By 2025, as foundational large model capabilities improved, Tao publicly tested GPT-5-level models on complex academic literature retrieval. In that test, the AI could quickly mine specific theorem origins from massive libraries of unstructured papers, saving him weeks of desk research. However, AI still played the auxiliary role of a "senior librarian" rather than directly participating in proof generation.
Entering early 2026, the situation changed qualitatively. Large models, represented by ChatGPT, began making significant progress on the famous Erdős open conjecture library, attempting to "independently" solve these hundreds of unsolved problems spanning number theory and combinatorics. Tao's GitHub page also recorded attempts to use these systems to automate handling of peripheral conjectures, filling in the marginal areas that humans lack the energy to explore.
(Source: X)
This demonstration using Claude Code serves as a bridge connecting the "frontier exploration" mentioned above with "daily practice." While not as dramatically public as Google's AlphaProof solving International Mathematical Olympiad (IMO) problems, Tao's demonstration in Lean's type-theory-ensured environment is more grounded and closer to the actual research routine of contemporary mathematicians.
Of course, while affirming the efficiency revolution brought by AI, Tao and the mathematical community he represents have not shied away from the technology's current limitations.
On one hand, there are concerns in academia that heavy reliance on AI-generated proofs could introduce "black box" issues. Even though the Lean compiler can guarantee 100% logical correctness at the foundational level, lengthy machine-generated code lacks the intuitive beauty and readability characteristic of human mathematics. This could transform mathematics from an "art of understanding" into mere "symbolic verification."
In response, Tao maintains the objectivity and neutrality characteristic of a scientist. He tends to define AI as a powerful "experimental mathematics" tool. For tasks heavily reliant on computation and pattern matching, AI is irreplaceable; but in core areas like the Riemann Hypothesis that require disruptive intuition and deep conceptual restructuring, human dominance remains secure.
As he stated at an IPAM meeting: "As long as the time AI saves you is greater than the time you waste correcting it, it is a successful tool." This unedited 26-minute video serves as the best endorsement of that assertion.
In future mathematical research, "human-machine collaboration" may become the new normal. By then, AI may be able to serve as a "junior collaborator," completely bridging the bottleneck between intuitive conception and computerized formal verification in mathematics.
Reference Links:
Video URL: https://www.youtube.com/watch?v=JHEO7cplfk8&t=124s
Operations/Layout: He Chenlong