Can Solve Olympiad Math, Can't Read Clocks: 15 Judgments from Stanford's 2026 AI Report

On April 13, Stanford University's Human-Centered AI Institute (HAI) released the 2026 AI Index Report. Spanning over 400 pages, it covers technical capabilities, investment landscapes, employment impacts, and public perception, representing the most comprehensive third-party annual audit of the industry to date.

Published consecutively since 2017, this year's conclusion can be summarized in one sentence: AI capabilities are outpacing everything at an unprecedented speed—outstripping regulatory frameworks, public trust, educational systems, and even the willingness of AI companies themselves to maintain information transparency.

The following are the core insights extracted from the report.

Original Link: https://hai.stanford.edu/ai-index/2026-ai-index-report

I. Can Solve Olympiad Math, Can't Read Clocks

Let's start with a detail.

In this year's report, there is a chart where the horizontal axis represents time and the vertical axis represents AI performance relative to humans across various tasks. One line on the chart rises almost vertically: coding capability. SWE-bench Verified—a standard test measuring AI's ability to autonomously complete real-world software engineering tasks—jumped from 60% to nearly 100% within a single year. During the same period, the success rate of AI agents handling real-world tasks surged from 20% to 77.3%, and the problem-solving rate for cybersecurity issues skyrocketed from 15% to 93%.

Over the past year, the accuracy of Terminal-Bench 2.0 has significantly improved, rising from 20% in February 2025 to 77.3% in early 2026 (see Figure 2.5.2).

"Humanity's Last Exam" is a set of test questions jointly designed by nearly a thousand domain experts worldwide, specifically created to stump AI. It covers almost all high-difficulty disciplines including physics, mathematics, history, and law. In 2025, the top-ranked model could only answer 8.8% of the questions correctly. Today, frontier models have surpassed a score of 50%.

Between 2024 and 2025, HLE model accuracy improved by 30 percentage points (see Figure 2.4.4). Within one year, accuracy jumped from less than 10% to 38.3%.

This is not linear growth; it is a leap.

However, within the same report, there is another line: the success rate of robots completing real-world household tasks, such as folding clothes and washing dishes, remains at only 12%. AI still cannot reliably read analog clocks. Generating coherent videos remains difficult, multi-step planning still yields errors, and certain expert-level academic exams remain unanswered correctly.

Gemini Deep Think achieved a gold medal score of 35 points at the 2025 IMO (International Mathematical Olympiad) working in natural language within a 4.5-hour time limit, surpassing the silver medal score of 28 points obtained in 2024. On ClockBench, top models correctly read analog clocks 50.1% of the time, compared to 90.1% for humans.

The distribution of capabilities is uneven—some dimensions have already surpassed the range humans can verify, while others are still crawling. This is the true state of AI in 2026, and it is the underlying context for all subsequent questions.

II. US Investment is 23 Times China's, but AI Talent Inflow Has Dropped 89%

In 2025, global private AI investment reached $344.7 billion, a year-on-year increase of 127.5%. Total corporate-level AI investment reached $581.7 billion, more than doubling in one year.

The United States has been the most aggressive spender in this arms race. In 2025, US AI investment amounted to $285.9 billion, which is 23 times that of China, ranked second ($12.4 billion). This gap is overwhelming.

However, within the same report, another set of figures points in the completely opposite direction.

From 2017 to 2026, the number of top AI scholars migrating to the United States decreased by 89%. In the past year alone, this figure dropped by another 80%.

The implication of putting these two sets of numbers together is clear: The US is pouring more and more money into AI, but the number of top-tier talents it can recruit with this money is becoming fewer and fewer. While capital continues to flood in, its marginal value is being eroded by the loss of talent.

China's investment logic differs. The report points out that comparing solely based on private investment figures systematically underestimates the volume of capital China deploys into AI. Through the mechanism of "government guidance funds," the Chinese government has cumulatively deployed over $912 billion across various fields, including AI, since 2000. This money does not flow through market channels and does not appear in private investment data, but it undeniably exists.

In terms of the number of models, the US released 50 "notable" models in 2025, while China released approximately 30; the gap is narrowing. Regarding industrial robot installations, China installed 295,000 units in 2024, compared to 34,200 in the US, a gap of 8.6 times. The US and China are running on two parallel tracks in AI, with direct confrontation occurring only in a portion of these areas.

III. 22-Year-Old Programmers Feel It, While CEOs Still Say AI is Just a Tool

Data on employment impacts has become undeniably clear for the first time this year.

Since 2024, employment numbers for software developers aged 22 to 25 have declined by nearly 20%. During the same period, employment for colleagues aged 26 and above has remained basically flat or even seen slight growth. This does not mean the entire software industry is shrinking—rather, the AI impact is starting from the bottom, precisely cutting off entry-level positions.

Since 2022, employment numbers for the youngest workers (aged 22 to 25) have declined, despite continued growth in older age groups (see Figure 4.4.29). By September 2025, employment for software developers aged 22 to 25 had fallen by nearly 20% compared to the 2022 peak.

A similar pattern has emerged in the customer service sector: junior positions are contracting, while senior positions remain temporarily safe.

One-third of corporate executives surveyed by McKinsey stated they expect to further reduce staff sizes in the coming year, particularly concentrating in service industries, supply chains, and software engineering. This is a plan for the future, not something that has already happened. What has already happened is that young people are feeling it first.

Researchers of the report also proposed an important caveat: employment data is interfered with by the macroeconomic environment, making it impossible to completely isolate the impact of AI. However, they also pointed out an anomaly—occupations with low AI exposure have seen a higher rise in unemployment rates than those with high AI exposure. This contradicts the simple narrative of "AI direct replacement," suggesting that a more complex reconstruction of the labor market may be underway.

The report also provided figures on productivity gains brought by AI: a 14% increase in the customer service sector and a 26% increase in software development. These gains are real, but those enjoying these gains are workers who are already on the job and experienced. Young people entering the market are facing an entrance where the number of positions itself is decreasing.

Gains are concentrated at the top; the cost falls on the bottom.

IV. Models Are Getting Stronger, But Fewer Companies Are Revealing How They Are Trained

There is a set of numbers in this report that is least cited but possibly the most important.

The Foundation Model Transparency Index measures the extent to which major AI companies disclose their model training data, computing resources, capability boundaries, risks, and usage policies. The average score for this indicator last year was 58, dropping to 40 this year.

The report's conclusion is even more direct: Among the models with the lowest transparency, it is often the most capable ones.

The manual analysis openness index scores AI models from 0 to 100 based on the degree of free access and licensing, as well as the transparency of training methods and pre- and post-training data. Leading models score low, mostly between 2 and 16 out of 100 (see Figure 3.8.1).

This is an interesting reversal. While AI capabilities are accelerating in evolution, the information available to the public to understand, audit, and supervise these capabilities is systematically decreasing. Questions about what training data large models use, how much computing power is consumed, and what known limitations exist—issues that should garner more attention as capabilities enhance—are becoming more opaque as capabilities grow.

Figures on public trust also corroborate this. In global surveys, only 31% of Americans expressed trust that their government can effectively regulate AI, ranking second to last among all surveyed countries (China is last at 27%). The EU figure is 53%, a significant gap.

Meanwhile, Gen Z's sentiment towards AI is shifting. Once the earliest enthusiastic adopters of generative AI, survey data now shows rising anxiety and anger within this demographic. A researcher cited by TechCrunch put it more bluntly: "AI leaders themselves are saying 'if nothing is done, many people will suffer terribly,' and then wonder why the public is anxious. "

Four out of five American high school and college students use AI to complete academic tasks, yet only 6% of teachers report that their schools have clear AI usage policies. Capabilities are running ahead, frameworks are lagging behind, and the blank space in between is occupied by hundreds of millions of ordinary people using AI every day.

V. Training One Model Equals 17,000 Cars Running for a Year

AI capabilities are accelerating, and so are the costs. It's just that most of these costs are invisible.

The figures given in the report: xAI's Grok 4 training produced an estimated 72,800 tons of CO₂ equivalent in carbon emissions, equivalent to the greenhouse gases produced by 17,000 cars driving for an entire year. An independent estimate by Epoch AI suggests this number is even higher, at approximately 140,000 tons.

For comparison, OpenAI's GPT-4 training emissions were about 5,184 tons, and Meta's Llama 3.1 405B was about 8,930 tons. From GPT-4 to Grok 4, in less than two years, carbon emissions from a single training run have increased by more than 10 times.

Consumption on the inference side is also accumulating. The annual water consumption for GPT-4o inference (used for cooling data center servers or hydroelectric power generation) is estimated to exceed the annual drinking water needs of over 12 million people. The total electricity capacity of global AI data centers has reached 29.6 GW, equivalent to the peak electricity consumption of the entire state of New York, and comparable to the national electricity consumption of Switzerland or Austria.

Growing in sync with energy consumption is the concentration of computing power. Nvidia's GPUs currently account for over 60% of the world's total AI computing power, while global AI computing power has grown 3.3 times annually since 2022, cumulatively reaching 30 times the 2021 level. The physical foundation of the entire AI system is accelerating towards concentration among a few hardware suppliers and hyperscale cloud service providers.

These costs do not appear on AI product price tags, nor in the statistical figures of productivity gains. But they are real; they are simply distributed across the atmosphere, groundwater, and power grids.

Final Thoughts

There is a detail in the report that can serve as a footnote for the entire piece.

AI can already solve Mathematical Olympiad problems, but it still cannot reliably read analog clocks.

This unevenness is not a bug in AI; it is a characteristic of this stage. Capabilities in certain dimensions have already exceeded the range humans can intuitively verify, while other dimensions are still crawling. We are currently at a moment where both curves are moving rapidly—high-speed capability expansion coincides with a simultaneous slide in governance, trust, and transparency.

As Stanford researchers wrote in the report's preface: This year's report reveals that the gap between "what AI can do" and "whether we are prepared to manage it" is widening. What this report itself can do is use data to make the gap visible.

What happens after the gap widens is another question.