huggingface.co/spaces/econo...
huggingface.co/spaces/econo...
Models disclosing training data fell ~79% (2022) → ~39% (2025). And for the first time, open-weights > truly open-source—raising accountability and reproducibility concerns.
Models disclosing training data fell ~79% (2022) → ~39% (2025). And for the first time, open-weights > truly open-source—raising accountability and reproducibility concerns.
Community repos that quantize, repack, and adapt base models now move a large fraction of real-world usage.
MLX-Community, SD Concepts Library, LMStudio-Community and others are consolidating models for deployment, & artistic adaptation.
Community repos that quantize, repack, and adapt base models now move a large fraction of real-world usage.
MLX-Community, SD Concepts Library, LMStudio-Community and others are consolidating models for deployment, & artistic adaptation.
Average downloaded size rose 17× with 7× MoE and 5× quantization adoption; multimodal & video downloads grew ~3.4×.
Average downloaded size rose 17× with 7× MoE and 5× quantization adoption; multimodal & video downloads grew ~3.4×.
US big-tech share (Google/Meta/OpenAI) dominance has dissipated while community/unaffiliated devs surged, and Chinese industry (DeepSeek, Qwen) now commands a major share—hinting at a new consolidation wave among open-weights.
US big-tech share (Google/Meta/OpenAI) dominance has dissipated while community/unaffiliated devs surged, and Chinese industry (DeepSeek, Qwen) now commands a major share—hinting at a new consolidation wave among open-weights.
📄 Paper: dataprovenance.org/economies-of...
We also release a Dashboard: huggingface.co/spaces/econo...
📄 Paper: dataprovenance.org/economies-of...
We also release a Dashboard: huggingface.co/spaces/econo...
Full paper: arxiv.org/pdf/2510.22037
Huge thanks to my brilliant co-authors: Sneha, Niklas, I-Hung, Isaac, Sandy, Sercan, Chen-Yu, and Sayna!
Full paper: arxiv.org/pdf/2510.22037
Huge thanks to my brilliant co-authors: Sneha, Niklas, I-Hung, Isaac, Sandy, Sercan, Chen-Yu, and Sayna!
🌟Answer: We found compute-optimal crossover points for every model size.
Rough rule of thumb: finetune if your compute budget C is < 10^10 x N ^1.54, otherwise pretrain.
8/
🌟Answer: We found compute-optimal crossover points for every model size.
Rough rule of thumb: finetune if your compute budget C is < 10^10 x N ^1.54, otherwise pretrain.
8/
The curse is real but quantifiable: ϕ=0.11 (capacity penalty), ψ=-0.04 (data benefit from transfer).
7/
The curse is real but quantifiable: ϕ=0.11 (capacity penalty), ψ=-0.04 (data benefit from transfer).
7/
🌟Answer: We derived closed-form equations! To go from K to 4K languages while maintaining performance: scale data by 2.74×, model size by 1.4×.
6/
🌟Answer: We derived closed-form equations! To go from K to 4K languages while maintaining performance: scale data by 2.74×, model size by 1.4×.
6/
Languages sharing writing systems (e.g., Latin) show dramatically better transfer (mean: -0.23) vs different scripts (mean: -0.39).
Also important: transfer is often asymmetric—A helping B ≠ B helping A.
5/
Languages sharing writing systems (e.g., Latin) show dramatically better transfer (mean: -0.23) vs different scripts (mean: -0.39).
Also important: transfer is often asymmetric—A helping B ≠ B helping A.
5/
🌟Answer: We measure this empirically. We built a 38×38 transfer matrix, or 1,444 language pairs—the largest such resource to date.
We highlight the top 5 most beneficial source languages for each target language.
4/
🌟Answer: We measure this empirically. We built a 38×38 transfer matrix, or 1,444 language pairs—the largest such resource to date.
We highlight the top 5 most beneficial source languages for each target language.
4/
Without modeling transfer, existing laws fail on multilingual settings.
3/
Without modeling transfer, existing laws fail on multilingual settings.
3/
🌟Answer: Yes! ATLAS outperforms prior work with R²(N)=0.88 vs 0.68, and R²(M)=0.82 vs 0.69 for mixture generalization.
2/
🌟Answer: Yes! ATLAS outperforms prior work with R²(N)=0.88 vs 0.68, and R²(M)=0.82 vs 0.69 for mixture generalization.
2/
He has done some of the best research on fine-grained, scalable, and human-aligned LLM-as-a-judge evaluation.
➡️ Flask
➡️ Prometheus 1 & 2
➡️ Multilingual Prometheus
➡️ KMMLU
➡️ BigGen Bench
He has done some of the best research on fine-grained, scalable, and human-aligned LLM-as-a-judge evaluation.
➡️ Flask
➡️ Prometheus 1 & 2
➡️ Multilingual Prometheus
➡️ KMMLU
➡️ BigGen Bench