#MLPerf
AI Training Benchmarks Push Hardware Limits - IEEE Spectrum

https://spectrum.ieee.org/mlperf-trends
AI Model Growth Outpaces Hardware Improvements
Since 2018, the consortium MLCommons has been running a sort of Olympics for AI training. The competition, called MLPerf, consists of a set of tasks for training specific AI models, on predefined datasets, to a certain accuracy. Essentially, these tasks, called benchmarks, test how well a hardware and low-level software configuration is set up to train a particular AI model. Twice a year, companies put together their submissions—usually, clusters of CPUs and GPUs and software optimized for them—and compete to see whose submission can train the models fastest. There is no question that since MLPerf’s inception, the cutting-edge hardware for AI training has improved dramatically. Over the years, Nvidia has released four new generations of GPUs that have since become the industry standard (the latest, Nvidia’s Blackwell GPU, is not yet standard but growing in popularity). The companies competing in MLPerf have also been using larger clusters of GPUs to tackle the training tasks. However, the MLPerf benchmarks have also gotten tougher. And this increased rigor is by design—the benchmarks are trying to keep pace with the industry, says David Kanter, head of MLPerf. “The benchmarks are meant to be representative,” he says. Intriguingly, the data show that the large language models and their precursors have been increasing in size faster than the hardware has kept up. So each time a new benchmark is introduced, the fastest training time gets longer. Then, hardware improvements gradually bring the execution time down, only to get thwarted again by the next benchmark. Then the cycle repeats itself. _This article appears in the November 2025 print issue._
spectrum.ieee.org
November 15, 2025 at 4:02 PM
Blackwell Sweeps MLPerf, How to Achieve 4x Faster Inference for Math Problem Solving, and More
www.linkedin.com/pulse/blackw...
Blackwell Sweeps MLPerf, How to Achieve 4x Faster Inference for Math Problem Solving, and More
Welcome to your weekly drop of developer news. Subscribe for the latest technical deep dives, resources, trainings, and more.
www.linkedin.com
November 14, 2025 at 8:16 PM
NVIDIA Wins Every MLPerf Training v5.1 Benchmark
www.linkedin.com/pulse/nvidia...
NVIDIA Wins Every MLPerf Training v5.1 Benchmark
Follow @NVIDIANewsroom for real-time updates from NVIDIA. NVIDIA Wins Every MLPerf Training v5.
www.linkedin.com
November 14, 2025 at 5:02 PM
Just saw NVIDIA’s Blackwell crush every MLPerf Training v5.1 benchmark using FP4 precision – even outpacing FP16 on Llama 3.1’s 405‑billion‑parameter model. The future of GPU AI is here. Dive in for the full breakdown! #NVIDIABlackwell #MLPerf #FP4

🔗 aidailypost.com/news/nvidia-...
November 13, 2025 at 9:23 PM
Large language models #LLMs are growing extremely quickly, and the #hardware systems that they require can’t keep up with the pace. Each time #MLPerf introduces a new benchmark, training time increases. The data tells the story. spectrum.ieee.org/mlperf-trends
November 13, 2025 at 8:30 PM
Just need to brag for a moment about my team who designed the Theia #HPC cluster, as well as all our colleagues who built it, got it to perf, and keep it happy and running smoothly
November 13, 2025 at 2:59 PM
NVIDIA、Llama 3.1 405Bモデルの学習を約10分で完了 – MLPerf Training v5.1テストで記録更新
#ITニュース
ITちゃんねる
NVIDIA、Llama 3.1 405Bモデルの学習を約10分で完了 – MLPerf Training v5.1テストで記録更新 #ITニュース
it.f-frontier.com
November 13, 2025 at 1:14 PM
📰 NVIDIA Dominasi MLPerf Training v5.1, Menang di Semua Benchmark

👉 Baca artikel lengkap di sini: https://ahmandonk.com/2025/11/13/nvidia-menang-mlperf-training-v51/

#ai
#t#ain#trainingk#blackwell##gpua#llamar#mlperfi#nvidia
November 13, 2025 at 9:21 AM
NVIDIA's Blackwell architecture sets a new standard in AI training efficiency with MLPerf Training v5.1, demonstrating impressive results across all benchmark tests.
November 13, 2025 at 3:03 AM
NVIDIA sets new AI records with its MLPerf Training v5.1 benchmark, emphasizing Blackwell Ultra GPUs innovative use of NVFP4 precision and unprecedented LLM training speeds. More at: https://blogs.nvidia.com/blog/mlperf-training-benchmark-blackwell-ultra/
November 13, 2025 at 3:00 AM
NVIDIA's Blackwell Ultra GPU architecture led the way in MLPerf Training v5.1, demonstrating enhanced AI training capabilities across various model types.
November 13, 2025 at 2:48 AM
NVIDIA Blackwell Ultra Redefines AI Training: A Quantum Leap in MLPerf v51 Performance

Introduction: The Race to Smarter AI Infrastructure In the ever-accelerating era of artificial intelligence, the race is no longer about who builds the largest model — it’s about who trains it fastest, most…
NVIDIA Blackwell Ultra Redefines AI Training: A Quantum Leap in MLPerf v51 Performance
Introduction: The Race to Smarter AI Infrastructure In the ever-accelerating era of artificial intelligence, the race is no longer about who builds the largest model — it’s about who trains it fastest, most efficiently, and at scale. The demand for high-speed, high-efficiency computing has exploded as AI reasoning grows more complex and data-hungry. Every layer of technology — from GPUs and CPUs to networking and algorithms — is being pushed to its limits.
undercodenews.com
November 12, 2025 at 8:32 PM
NVIDIA Wins Every MLPerf Training v5.1 Benchmark https://blogs.nvidia.com/blog/mlperf-training-benchmark-blackwell-ultra/
In the age of AI reasoning, training smarter, more capable models is critical to scaling intelligence. Delivering the massive performance to meet this new age requires breakthroughs across GPUs, CPUs, NICs, scale-up and scale-out networking, system architectures, and mountains of software and algorithms. In MLPerf Training v5.1 — the latest round in a long-running series of industry-standard tests of AI training performance — NVIDIA swept all seven tests, delivering the fastest time to train across large language models (LLMs), image generation, recommender systems, computer vision and graph neural networks. NVIDIA was also the only platform to submit results on every test, underscoring the rich programmability of NVIDIA GPUs, and the maturity and versatility of its CUDA software stack. ## **NVIDIA Blackwell Ultra Doubles Down** The GB300 NVL72 rack-scale system, powered by the NVIDIA Blackwell Ultra GPU architecture, made its debut in MLPerf Training this round, following a record-setting showing in the most recent MLPerf Inference round. Compared with the prior-generation Hopper architecture, the Blackwell Ultra-based GB300 NVL72 delivered more than 4x the Llama 3.1 405B pretraining and nearly 5x the Llama 2 70B LoRA fine-tuning performance using the same number of GPUs. These gains were fueled by Blackwell Ultra’s architectural improvements — including new Tensor Cores that offer 15 petaflops of NVFP4 AI compute, twice the attention-layer compute and 279GB of HBM3e memory — as well as new training methods that tapped into the architecture’s enormous NVFP4 compute performance. Connecting multiple GB300 NVL72 systems, the NVIDIA Quantum-X800 InfiniBand platform — the industry’s first end-to-end 800 Gb/s scale-up networking platform — also made its MLPerf debut, doubling scale-out networking bandwidth compared with the prior generation. ## **Performance Unlocked: NVFP4 Accelerates LLM Training** Key to the outstanding results this round was performing calculations using NVFP4 precision — a first in the history of MLPerf Training. One way to increase compute performance is to build an architecture capable of performing computations on data represented with fewer bits, and then to perform those calculations at a faster rate. However, lower precision means less information is available in each calculation. This means using low-precision calculations in the training process calls for careful design decisions to keep results accurate. NVIDIA teams innovated at every layer of the stack to adopt FP4 precision for LLM training. The NVIDIA Blackwell GPU can perform FP4 calculations — including the NVIDIA-designed NVFP4 format as well as other FP4 variants — at double the rate of FP8. Blackwell Ultra boosts that to 3x, enabling the GPUs to deliver substantially greater AI compute performance. NVIDIA is the only platform to date that has submitted MLPerf Training results with calculations performed using FP4 precision while meeting the benchmark’s strict accuracy requirements. ## **NVIDIA Blackwell Scales to New Heights** NVIDIA set a new Llama 3.1 405B time-to-train record of just 10 minutes, powered by more than 5,000 Blackwell GPUs working together efficiently. This entry was 2.7x faster than the best Blackwell-based result submitted in the prior round, resulting from efficient scaling to more than twice the number of GPUs, as well as the use of NVFP4 precision to dramatically increase the effective performance of each Blackwell GPU. To illustrate the performance increase per GPU, NVIDIA submitted results this round using 2,560 Blackwell GPUs, achieving a time to train of 18.79 minutes — 45% faster than the submission last round using 2,496 GPUs. ## **New Benchmarks, New Records** NVIDIA also set performance records on the two new benchmarks added this round: Llama 3.1 8B and FLUX.1. Llama 3.1 8B — a compact yet highly capable LLM — replaced the long-running BERT-large model, adding a modern, smaller LLM to the benchmark suite. NVIDIA submitted results with up to 512 Blackwell Ultra GPUs, setting the bar at 5.2 minutes to train. In addition, FLUX.1 — a state-of-the-art image generation model — replaced Stable Diffusion v2, with only the NVIDIA platform submitting results on the benchmark. NVIDIA submitted results using 1,152 Blackwell GPUs, setting a record time to train of 12.5 minutes. NVIDIA continued to hold records on the existing graph neural network, object detection and recommender system tests. ## **A Broad and Deep Partner Ecosystem** The NVIDIA ecosystem participated extensively this round, with compelling submissions from 15 organizations including ASUSTeK, Dell Technologies, Giga Computing, Hewlett Packard Enterprise, Krai, Lambda, Lenovo, Nebius, Quanta Cloud Technology, Supermicro, University of Florida, Verda (formerly DataCrunch) and Wiwynn. NVIDIA is innovating at a one-year rhythm, driving significant and rapid performance increases across pretraining, post-training and inference — paving the way to new levels of intelligence and accelerating AI adoption. _See more NVIDIA performance data on the_ _Data Center Deep Learning Product Performance Hub_ _and_ _Performance Explorer_ _pages._ Categories: Data Center | Hardware | Networking | Software Tags: Artificial Intelligence
blogs.nvidia.com
November 12, 2025 at 6:47 PM
MLPerf Training v5.1 takeaways: GenAI drives, scale ramps, field widens. Two timely swaps—Llama 3.1 8B and Flux.1—match real workloads. Here’s our take: techarena.ai/content/mlpe...
MLPerf Training v5.1: GenAI Drives, Scale Ramps, Field Widens - Articles
Two new genAI tests (Llama 3.1 8B, Flux.1) align with production stacks as multi-node results climb. NVIDIA posts many fastest times; University of Florida, Wiwynn, and Datacrunch expand the ecosystem...
techarena.ai
November 12, 2025 at 5:41 PM
NVIDIA Wins Every MLPerf Training v5.1 Benchmark | NVIDIA Blog
NVIDIA Wins Every MLPerf Training v5.1 Benchmark
In MLPerf Training v5.1, NVIDIA swept all seven tests, delivering the fastest time to train across LLMs, image generation, recommender systems, computer vision and graph neural networks.
blogs.nvidia.com
November 12, 2025 at 4:14 PM
MLPerf Training v5.1 results are live!
Record participation: 20 organizations submitted 65 unique systems featuring 12 different accelerators. Multi-node submissions increased 86% over last year, showing the industry's focus on scale.
Results: mlcommons.org/2025/11/trai...
#MLPerf
1/3
November 12, 2025 at 4:06 PM
MLPerf Training 5.1: Nvidia gewinnt alles, aber AMD kommt mit Partnern endlich an
MLPerf Training 5.1: Nvidia gewinnt alles, aber AMD kommt mit Partnern endlich an
Während Nvidia einmal mehr versuchte, die neuen MLPerf-Training-Ergebnisse für sich zu nutzen, glänzt auch AMD stetig mehr.
www.computerbase.de
November 12, 2025 at 4:00 PM
- Management said CoreWeave “delivered many of the initial scale deployments of GB200s” and is first to market with GB300s; it’s also the only cloud that submitted MLPerf inference results for GB300s—strong proof they’re winning on performance for frontier + enterprise inference/training.
November 11, 2025 at 4:02 PM
AI Cloud: Delivered $1.4B revenue with a $55B backlog and new deals with Meta, OpenAI, and a sixth hyperscaler. Deployed GB200/GB300 clusters, becoming the only cloud to submit MLPerf inference benchmarks on GB300.
November 11, 2025 at 12:03 PM
Azure's new ND GB300 VM is crushing it—1.1M tokens/sec on Llama2 70B with FP4, 50% more GPU memory, and TensorRT-LLM tuned for MLPerf v5.1. Curious how this stacks up for your AI workloads? Dive in! #AzureNDGB300 #Llama2_70B #TensorRTLLM

🔗 aidailypost.com/news/microso...
November 4, 2025 at 5:55 AM
As AI hardware gets more powerful, #AI models get more demanding. Just like every other piece of software: spectrum.ieee.org/mlperf-trends #ArtificialIntelligence
AI Model Growth Outpaces Hardware Improvements
AI training races are heating up as benchmarks get tougher.
spectrum.ieee.org
November 4, 2025 at 4:08 AM
IEEE Spectrum's analysis shows how our benchmarks capture the real industry challenge - LLMs scale exponentially while hardware improves incrementally.
This pattern highlights why evolving benchmarks are important. Stay tuned for MLPerf Training v5.1, out on 11/12.

spectrum.ieee.org/mlperf-trends
AI Model Growth Outpaces Hardware Improvements
AI training races are heating up as benchmarks get tougher.
spectrum.ieee.org
November 3, 2025 at 5:54 PM
Radoyeh Shojaei, Predrag Djurdjevic, Mostafa El-Khamy, James Goel, Kasper Mecklenburg, John Owens, P{\i}nar Muyan-\"Oz\c{c}elik, Tom St. John, Jinho Suh, Arjun Suresh: MLPerf Automotive https://arxiv.org/abs/2510.27065 https://arxiv.org/pdf/2510.27065 https://arxiv.org/html/2510.27065
November 3, 2025 at 6:33 AM
..assumes infinite compute growth. What happens when that assumption breaks?

Read the full analysis: https://spectrum.ieee.org/mlperf-trends

(5/5)
November 1, 2025 at 8:36 AM