Reporter | Song Xinyue
Editor | Jin Mingyu, Yang Jun, Du Bo | Proofreader | Zhang Jinhe
After ten consecutive weeks of growth, global AI large model token calls have hit the brakes.
According to calculations by National Business Daily reporters based on the latest data from OpenRouter (currently the world's largest AI model API aggregation platform, boasting over 5 million developer users, with its API call volume data regarded as a "barometer" for AI application deployment), total global AI large model calls last week (April 13–19) amounted to 20.6 trillion tokens, marking two consecutive weeks of decline.
Notably, among the listed AI large models, Chinese AI models saw a week-on-week drop of 23.77% to 4.44 trillion tokens, while U.S. AI models rose 20.62% week-on-week to 4.91 trillion tokens, surpassing China for the first time in nearly two months. This reversal closely aligns with a global surge in computing power costs.
Turning Point Stems from Global Rise in Computing Costs
Based on the latest OpenRouter data, total global AI large model calls last week reached 20.6 trillion tokens, declining for two consecutive weeks after ten straight weeks of growth. Among listed models, Chinese AI models dropped to 4.44 trillion tokens weekly, down 23.77% from the previous week, while U.S. AI models hit 4.91 trillion tokens, up 20.62% week-on-week, overtaking China for the first time in nearly two months.
The shift from ten weeks of rising global calls to a downturn stems from a worldwide increase in computing costs. Since March, major cloud providers including Alibaba Cloud, Tencent Cloud, and Baidu Cloud have successively raised prices for large model-related services.
On April 8, Zhipu AI released GLM-5.1 and simultaneously raised prices by 10%, marking its third price adjustment this year. Post-adjustment, the cache-hit token price for GLM-5.1 in coding scenarios now approaches that of Anthropic's Claude Sonnet 4.6, signaling that domestic large models have achieved price parity with overseas leaders in core application scenarios for the first time.
Overseas AI giant Anthropic also adjusted its pricing strategy, changing its enterprise product Claude Enterprise subscription model from a "fixed fee of up to $200 per user per month" to "billing based on actual computing consumption plus a $20 monthly fixed fee."
This means monthly costs may decrease for light users but increase for heavy users. Fredrik Filipsson, co-founder of Redress Compliance, a company assisting with software licensing negotiations, stated that the new pricing could double or even triple costs for heavy users.
Domestic Models Face Major Test of "Product Strength"
"With token fees generally rising and costs increasing, users are forced to control total usage to save costs," Hu Yanping, a distinguished professor at Shanghai University of Finance and Economics, told National Business Daily. "When price advantages are no longer prominent, a model's product strength becomes the key factor influencing user choices."
Hu analyzed that last week's rebound in U.S. AI model calls was mainly driven by Anthropic's Claude Sonnet and Opus models. These two models have become "hard currency" in the coding field, whereas domestic large models still need improvement in this aspect.
OpenRouter data shows that last week, Claude Sonnet 4.6 topped the chart with weekly calls reaching 1.38 trillion tokens, up 19% week-on-week; Claude Opus 4.6 ranked third with 1.22 trillion tokens weekly. Together, these two models accounted for over half of U.S. AI large model weekly calls.
In contrast, recent domestic models have experienced a "roller-coaster" performance.
In the previous week (April 6–12), Alibaba's Qwen3.6 Plus led globally with 1.66 trillion tokens weekly, but just one week later (April 13–19), it dropped off the leaderboard entirely.
Additionally, reporters noted that Kimi K2.5 and Zhipu's GLM series models, which previously appeared on the list multiple times, have not ranked for three consecutive weeks. Similarly, Step 3.5 Flash from StepFun, which once reached second place, has also missed the list for the past two weeks.
Hu believes that OpenRouter's user base mainly consists of developers and small-to-medium enterprises, who have extremely high demands for model iteration capabilities and vertical scenario product strength. "Market users tend to concentrate on leading mainstream models; within their reach, users only choose the best," he pointed out. User scenarios increasingly require models to possess strong tool-calling capabilities, multi-agent support, and the ability to sustainably accomplish long-term complex tasks. "Most models currently on OpenRouter still need significant improvement in these areas."
Industry insiders also told National Business Daily that after the rise in computing costs, the industry generally prioritizes tools with stable performance and reliable output effects; price is no longer the primary consideration.
Experts: Global Consumption Still in Rapid Growth Phase
Do short-term data fluctuations indicate that the AI application boom is fading?
"It's too early to draw conclusions in the short term," Hu reminded. "OpenRouter's token call volume accounts for only about 2% to 4% of global total consumption. Its ranking fluctuations mainly reflect competition among open-source, second-tier, and newly released models and cannot represent the entire market trend."
In fact, cost pressures are forcing market evolution. Hu observed that since early this year, various agents and multi-agent applications like OpenClaw have pushed token call volumes to two or three times last year's end levels. The significant cost increase has prompted enterprises and users to actively reduce consumption through memory optimization, prompt compression, Harness Engineering, and other means.
Reporters learned that some small and medium-sized entrepreneurs have even incorporated token usage into employee performance assessments. The market is evolving from a phase of simply pursuing quantity ("volume stacking") to one focused on achieving higher return on investment ("efficiency improvement").
A deeper change lies in the qualitative transformation of AI application scenarios themselves. A research report by Guolian Minsheng Securities introduced the concept of "token inflation." This does not mean tokens themselves are becoming more expensive, but rather refers to a structural increase in token consumption per unit time and per user.
User demand is shifting from superficial "Q&A" to deep "task execution." Tokens are no longer like "traffic" in the traditional internet era with near-zero marginal costs; they are essential "fuel" for executing production tasks.
JPMorgan issued an extremely optimistic forecast for the Chinese market in its research report, predicting that from 2025 to 2030, China's token consumption will grow at a compound annual rate of 330%, achieving a 370-fold increase within five years.
Hu also remains firmly optimistic about long-term trends: "In the medium to long term, regardless of statistical fluctuations on OpenRouter, global token consumption, including China's, is on a rapid growth track. Over the next two to three years, we will see increases of dozens or even hundreds of times."
The current decline in call volumes may merely be a brief portfolio shift under price shock. The real question is not whether the AI boom is fading, but rather, as tokens move from "free trials" to "real pricing," which models can withstand the market's vote with real money.
Cover image source: NBD Media Asset Library
| National Business Daily | nbdnews | Original Article |
Reproduction, excerpting, copying, or mirroring without permission is prohibited.
For reproduction requests, please apply via our official account backend and obtain authorization.