State of OSS AI, Spring 2026

Hugging Face dropped its Spring 2026 State of Open Source report this week. The headline numbers tell three different stories at once: a geographic shift, an institutional shift, and a concentration shift. None of them point in the direction the 2024 conventional wisdom suggested they would.

70% → 37%

In 2024, US-origin models accounted for roughly 70% of all model downloads from the Hugging Face hub. Today that share is 37%. Most of the missing 33 points went to China-origin models, which now account for 41% of downloads — more than the US for the first time.

This isn't a "China is catching up" story. It's a "China caught up and the audience moved." Qwen-3, DeepSeek-V3, Kimi K2, Yi, Baichuan — the catalog of credible Chinese open-source frontier models grew faster than people tracking US-only releases noticed. Hugging Face's traffic numbers are downstream of practitioners deciding which models to download for real work. Practitioners voted with their bandwidth.

The remaining 22% is distributed across European labs, independent contributors, and university releases.

Industry labs lost ground to solo devs

The second shift, less publicized but more interesting: individual contributors now ship more credible open-source models than industry labs do. Meta, Google, Anthropic, Mistral combined produce fewer top-tier OSS releases per quarter than the long tail of solo developers and small teams.

That's partially because Meta — historically the largest OSS contributor — has slowed its frontier-model release cadence. It's partially because fine-tuning and merging recipes have gotten good enough that a single developer with a few hundred dollars of compute can produce a model that ranks competitively on a domain-specific benchmark. The barrier to "credible open-source release" dropped.

The implication for builders: when you go looking for an open-source model for a specific use case, the right place to look has shifted. The official Meta / Google / Mistral release lists are still useful but no longer comprehensive.

Concentration in 200 models

The third shift cuts the other way. 50% of all downloads now go to 200 models, out of 2 million publicly hosted on Hugging Face. The long tail of "I trained a thing once and posted it" hasn't gone away — there are more such models than ever — but the working set practitioners actually pull from is small and getting smaller.

That makes sense if you think about it. Most people aren't experimenting with models; they're shipping something. And shipping requires a model with a working ecosystem of fine-tunes, quantized variants, inference templates, and benchmarks. The 200 are the models that have those ecosystems. Everything else is a leaf node.

What to do this quarter

Three actions that follow from the report.

Audit your model defaults. If your 2024 default was meta-llama/Llama-3 or some variant, ask whether that's still the right baseline for your workload. The Qwen-3 family is now ahead on most multi-lingual tasks. DeepSeek-V3 is ahead on reasoning. Kimi K2 is ahead on long context. The right answer depends on your workload, but it almost certainly isn't the 2024 default anymore.

Don't assume small-team releases are less mature. A fine-tune from a known practitioner on a domain you care about can outperform an industry lab's generalist release by a wide margin. The "industry lab = best quality" heuristic isn't reliable in spring 2026.

Track the 200, not the 2 million. Hugging Face's filtered top-200 list, updated weekly, is now the right working set to monitor. Anything outside it is either niche or experimental — both legitimate, but not where most production work should start.

The numbers — 70 to 37, China at 41, 50% on 200 models, 13M users — are useful as quotes. The actionable bit is the underlying shift in who you should trust for your next model choice. The 2024 answer was "the big four labs." The spring 2026 answer is "look harder."