China Surpasses U.S. in Open Source for the First Time: Hugging Face Releases Global AI Open Source Status Report

China has surpassed the United States for the first time, becoming the country with the highest monthly model downloads on the Hugging Face platform.

And this massive shift occurred in just one year.

This report from Hugging Face reveals the true face of the open source AI ecosystem—from the reshaping of the global competitive landscape and the rise of regional powers to new frontiers in robotics and science—presenting a complete and clear map of open source AI.

China's Rise and Ecosystem Evolution

2025 marks a watershed year for open source AI.

Hugging Face's user base has climbed to 13 million, with public models exceeding 2 million and datasets surpassing 500,000—figures that have nearly doubled from the previous year.

Behind this growth lies a deeper transformation: users are no longer just consumers of models. Increasingly, people are creating derivative works, with fine-tuned models, adapters, benchmarks, and applications emerging endlessly.

The ecosystem's prosperity has not masked the issue of concentration.

About half of all models receive fewer than 200 downloads, while the top 200 models—representing just 0.01% of the total—capture 49.6% of all downloads.

Open source AI is better understood as multiple overlapping sub-ecosystems, each forming communities around specific domains, languages, or problem areas. Even with modest overall download numbers, these communities maintain sustained engagement and reuse.

The competitive landscape is undergoing profound adjustment.

Over 30% of Fortune 500 companies have established verified accounts on Hugging Face.

Startups treat open source models as default components. Thinking Machines' Tinker model is built entirely on open weights, while mainstream IDEs like VSCode and Cursor support both open and closed source models.

Established U.S. companies like Airbnb are also increasing investment in the open source ecosystem. Hugging Face has observed more traditional companies upgrading their organizational subscriptions throughout 2025.

Big Tech's moves are even more noteworthy.

NVIDIA has become the most active contributor. Major tech companies are creating new repositories on the Hugging Face Hub, with repository growth curves clearly showing sustained investment.

Research in open source software demonstrates that the downstream value of open source outputs far exceeds their production costs. Similar patterns are emerging in AI.

Open source models are reused, adapted, and specialized by thousands of downstream applications. Organizations relying solely on closed systems often face higher costs and limited flexibility in deployment and customization.

Geopolitical shifts have been the most dramatic.

Data on cumulative downloads over the past four years shows the U.S. and China as leading contributors, with the UK, Germany, and France following closely behind.

Model developers who are individual users or distributed organizations without clear geographic attribution account for about half of total platform downloads.

But 2025 brought a fundamental shift.

Hugging Face data shows China surpassing the United States, leading in both monthly and total downloads. Over the past year, Chinese models rapidly captured 41% of the download share.

Industry's share of overall development dropped from approximately 70% before 2022 to 37% in 2025.

During the same period, independent or unaffiliated developers rose from 17% to 39%, sometimes accounting for more than half of total usage.

Individuals and small collectives focus on quantization and adapting foundation models. These intermediary groups now guide what a significant portion of users can run and how innovation spreads through the ecosystem.

Different regions participate in the ecosystem in different ways.

The U.S. and Western Europe historically dominated through large industrial labs like Google, Meta, OpenAI, and Stability AI, while China is increasingly leading in both releases and adoption.

France, Germany, and the UK continue contributing through research institutions, national AI initiatives, and specialized model families. Ecosystems supporting diverse contributors and organizational forms tend to produce more widely adopted results.

Hot models from startups spread more widely. Competitive nations include France and South Korea.

Notably, the fourth-largest entity developing new popular models is an individual user, not an organization. It's easier than ever for users to create competitive models.

DeepSeek R1's viral spread in January 2025 became a landmark event for China's open source wave.

Since then, both the number of competitive Chinese organizations and their repositories on Hugging Face have grown explosively.

Baidu surged from zero Hub releases in 2024 to over 100 repositories in 2025.

ByteDance and Tencent increased their releases 8 to 9 times.

Organizations like Baidu and MiniMax, which previously favored closed-source strategies, have decisively pivoted to open source releases.

On the U.S. side, a similar number of popular organizations have consistently contributed higher quantities of repositories. Meta and its predecessor Facebook Research have contributed a significant proportion of open source releases, with Google also contributing but to a lesser extent.

Viewed together, the steep upward trajectory of popular Chinese organizations' repository growth reveals a key strategic difference.

Sovereign AI and the Hardware Landscape

Open source AI is increasingly intertwined with sovereignty issues.

Open weight models allow governments and public institutions to fine-tune systems with local data under national legal frameworks.

Models deployable on domestic hardware reduce dependence on foreign-controlled cloud infrastructure. Transparency in model architecture, training processes, and evaluation supports regulatory scrutiny and public accountability.

Governments are already taking action.

South Korea's National Sovereign AI Plan launched in mid-2025, designating LG AI Research, SK Telecom, Naver Cloud, NC AI, and Upstage as national champions to produce competitive domestic models.

In February 2026, three Korean models simultaneously appeared on Hugging Face's trending list.

In March 2026, South Korea and U.S. startup Reflection AI announced a data center partnership to bring frontier open weight models to Korea.

Switzerland's AI initiative and multiple EU-funded projects reflect similar priorities. The UK's "public money, public code" principle has influenced multiple government-supported AI initiatives.

Investment is yielding returns. Models and datasets are typically used most in the regions where they're developed, with developers often choosing models that best represent their language and reflect similar technical and application needs.

The list of most popular models is also changing.

A year ago, the most liked models primarily came from Meta's Llama family in the U.S.

A year later, the list shows an international mix, with China's DeepSeek-R1 at the top. This metric doesn't necessarily reflect usage, but accumulated attention signals interest.

Regarding papers and scientific contributions, Hugging Face Daily Papers data shows that papers from large AI organizations receive broad recognition from community members.

The most upvoted papers come mainly from leading organizations in the U.S. and China.

Chinese big tech companies account for the majority, with ByteDance sharing numerous high-impact papers.

From another angle, papers involving model and dataset creation show more diverse open source adoption. Medical papers have notable influence, while big tech companies' impact is more dispersed.

Data on derivative models reveals an interesting phenomenon.

As an organization, Alibaba has more derivative models than Google and Meta combined. The Qwen family constitutes over 113,000 derivative models. When counting all models labeled Qwen, the number swells to over 200,000.

Model development increasingly emphasizes accessibility.

Small models have far higher download and deployment rates than super-large systems, reflecting practical constraints of cost, latency, and hardware availability.

Small models dominate partly because more are released, but even when normalized, ATOM project relative adoption metrics show the median top 10 models with 1 to 9 billion parameters have only about 4 times higher downloads than models with over 100 billion parameters.

Automated systems and CI pipelines further inflate small model download counts, but the trend toward smaller deployable models is real.

User engagement with open source models typically peaks quickly after release, then slows. Average engagement duration is about 6 weeks. Continuous improvement and frequent updates are crucial for maintaining relevance.

DeepSeek's consecutive releases—V3, R1, V3.2—kept it competitive even as challengers emerged. Organizations with stagnant development quickly lose share to competitors with frequent updates or domain-specific fine-tuning.

The size of downloaded models is also changing. In 2023, the average parameter count for downloaded models was 827 million, rising to 20.8 billion in 2025, driven mainly by quantization and mixture-of-experts architectures.

The median, however, increased only slightly, from 326 million to 406 million. This divergence suggests high-end large language model users are pulling up the mean, while underlying small model usage remains stable.

The performance gap between frontier models and smaller systems often narrows quickly through fine-tuning and task adaptation.

On the Hugging Face Hub, models with hundreds of millions of parameters support search, annotation, and document processing workflows, while models with single-digit billions of parameters are widely used for coding, reasoning, and multimodal tasks.

Most major model developers now release model families covering different sizes. Capable small models push autonomy to the edge, reducing dependence on centralized cloud providers.

Open source AI development is closely tied to hardware trends.

Most models are optimized for NVIDIA GPUs, but AMD hardware support continues to expand.

Stability AI's model collection is now optimized for both NVIDIA and AMD platforms. Libraries increasingly target both, with tool improvements making cross-hardware deployment more straightforward.

In 2025, Hugging Face launched Kernel Hub, loading and running kernels optimized for both NVIDIA and AMD GPUs.

Chinese open source models are beginning to explicitly support domestic chips.

Alibaba is investing in inference-specific chip architectures, aiming to equip Chinese data centers with hardware capable of running open source models locally.

For open weight models, computing resources remain a core requirement for development and deployment, but they are helping break up an ecosystem that was once all-or-nothing. Models are being released at every performance tier, with efficiency 10 to 1000 times lower cost than flagship models from the largest developers.

Investment in open source infrastructure remains an urgent issue. Publicly funded data centers capable of training and serving open source models have become a growing policy discussion topic, especially in Europe and the UK.

The gap between computing resources available to large closed-source model companies versus those accessible to the open source community continues to shape the boundaries of what's feasible in open source development.

New Frontiers in Robotics and Science

Robotics has become one of Hugging Face's fastest-growing sub-communities.

The numbers are impressive: robotics datasets grew from 1,145 in 2024 to 26,991 in 2025, climbing from 44th place to first in dataset categories within three years.

For comparison, the second-largest category, text generation, had only about 5,000 datasets in 2025.

Community-contributed datasets range from household manipulation tasks to autonomous driving. The largest spatial intelligence multimodal dataset, Learning to Drive, was released by LeRobot in partnership with Yaak.

Datasets like RoboMIND provide over 107,000 real-world trajectories covering 479 different tasks and various robot morphologies, offering the scale and diversity needed for training generalizable robot policies.

Hugging Face acquired Pollen Robotics, expanding open source robot sales to industrial labs, academic labs, and hobbyists.

LeRobot, Hugging Face's open source robotics library, provides models, datasets, and PyTorch tools for real-world robots, covering imitation learning, reinforcement learning, and vision-language-action models. GitHub repository stars nearly tripled over the past year.

Scientific research is another active area. Open source models and datasets are increasingly used for protein folding, molecular dynamics, drug discovery, and scientific data analysis. All frontier AI companies now have dedicated science teams, though the current focus remains on literature discovery rather than direct experimentation.

Community-led projects form around shared research goals, often involving hundreds of contributors across institutions and disciplines. These efforts highlight open source's role as a coordination mechanism for large-scale interdisciplinary work that's difficult to organize through traditional academic or corporate structures alone.

Looking ahead, the open source AI ecosystem continues evolving through global participation, technical specialization, and institutional adoption. Several trends are likely to define the next phase.

Geographic power rebalancing is accelerating. Western organizations increasingly seek commercially viable alternatives to Chinese models. Efforts like OpenAI's GPT-OSS, AI2's OLMo, and Google's Gemma are becoming more urgent, aiming to provide competitive open source options from U.S. and European developers. Whether these efforts can match the adoption momentum of Qwen and DeepSeek will be a defining question for 2026.

Growth in robotics and science sub-communities indicates open source AI is expanding from language and image generation into physical and experimental domains. The infrastructure, norms, and coordination mechanisms developed around text and image models are adapting to new modalities and use cases.

For researchers, developers, companies, and governments, open source remains a foundational layer for building, evaluating, and governing AI systems.

As agent deployment increases, open source and its interoperability will become critical for agents to thrive.

The past year's trajectory clearly shows that the open source ecosystem is where the practical work of AI development, adaptation, and deployment largely occurs, and its influence on the broader AI landscape continues to grow.

Reference:

https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026

China Surpasses U.S. in Open Source for the First Time: Hugging Face Releases Global AI Open Source Status Report

China's Rise and Ecosystem Evolution

Sovereign AI and the Hardware Landscape

New Frontiers in Robotics and Science

Related Articles

分享網址