Glenn K. Lockwood
banner
glennklockwood.mast.hpc.social.ap.brid.gy
Glenn K. Lockwood
@glennklockwood.mast.hpc.social.ap.brid.gy
I am a supercomputing enthusiast, but I usually don't know what I'm talking about. I post about large-scale infrastructure for #HPC and #AI.

🌉 bridged from https://mast.hpc.social/@glennklockwood on the fediverse by https://fed.brid.gy/
It was tradition at NERSC for the director to give everyone a half-day off on the Wednesday before Thanksgiving. By comparison, VAST has no company holidays, so technically, nobody gets Thanksgiving off (much less the half day before it!)
November 26, 2025 at 9:30 PM
November 24, 2025 at 11:21 PM
Reposted by Glenn K. Lockwood
Finally, #jupiter crossed the 1 ExaFLOP/s threshold today. The list is lying to you, though, it's not like it's 1000 PFLOP/s exactly, it's 1000.184 PFLOP/s; the rest got lost to rounding.
The 184 TFLOP/s are pretty much exactly the same as the previous #jsc […]

[Original post on mastodon.social]
November 17, 2025 at 8:24 PM
Reposted by Glenn K. Lockwood
November 20, 2025 at 5:42 PM
Andreas Dilger is now working for The Lustre Collective (https://thelustrecollective.com) after leaving DDN. I am glad to see his leadership continue to drive Lustre into the future. Say what you will about it, Lustre is the standard to which every other #hpc file system is compared.

#sc25
November 18, 2025 at 11:51 PM
When I spent a Thanksgiving week after SC writing the non-MPI layer for Darshan years ago, I thought to myself “surely this work will make me famous!”

I guess my ship finally came in at the PDSW keynote by Rob Ross.

#sc25
November 17, 2025 at 3:27 PM
Apparently I was the first DAOS user to complain about having to refer to DAOS containers by UUIDs, so they added container labels. Don’t know if this is completely true, but I remember voicing this and will accept the credit if Mohamad is willing to give it to me 🙂

(Learned this at my own […]
Original post on mast.hpc.social
mast.hpc.social
November 16, 2025 at 4:43 PM
New SC record: ran into a colleague within 2 minutes of walking into the airport terminal from the curb. Been catching up nonstop straight through takeoff. Conference starts earlier and earlier every year.
November 15, 2025 at 6:27 PM
Chatting with a pal reminded me of a fun pre-SC activity: looking back at old conference takes that aged like milk. Remember this one?

#hpc #zettascale #hedoesntworkthereanymore
November 14, 2025 at 10:40 PM
🎶 It's the most wonderful time of the year 🎶

#sc25
November 14, 2025 at 8:22 PM
VAST and CoreWeave just announced a >$1.1 billion partnership to deliver #ai data services. Mind you, that's a billion in services, not GPUs. Though I can't claim any credit, I'm proud to work for a company that's earned this level of trust from a partner […]
Original post on mast.hpc.social
mast.hpc.social
November 7, 2025 at 12:13 AM
SC25 will be my 12th SC (10th in-person). I've attended and presented on behalf of SDSC, NERSC, Microsoft before, but I've got to say: this year has been the most work and most stress I've ever had around the conference.
November 5, 2025 at 7:23 PM
Google recently posted a promo for using their managed #lustre service to accelerate inferencing via KV caching. Raises questions:

1. What ever happened to Google Managed #daos (ParallelStore)? It performs better than Lustre.

2. Does Gemini use this? Unlikely. See […]
Original post on mast.hpc.social
mast.hpc.social
November 4, 2025 at 4:41 PM
This is like showing up with a new boyfriend the week after the divorce. At least Microsoft is still getting those alimony payments.

https://www.aboutamazon.com/news/aws/aws-open-ai-workloads-compute-infrastructure
November 3, 2025 at 4:11 PM
NVIDIA, Oracle, and US DOE are named in the headline. Argonne is not. I don’t think this is an Argonne system.

https://nvidianews.nvidia.com/news/nvidia-oracle-us-department-of-energy-ai-supercomputer-scientific-discovery
October 30, 2025 at 3:33 PM
"AMD Powers U.S. Sovereign AI Factory Supercomputers" - What exactly is "US sovereign AI?" I think the whole point of "sovereign AI" is "not dependent on the USA." All AI is, by default, sovereign to the US […]
Original post on mast.hpc.social
mast.hpc.social
October 30, 2025 at 2:32 AM
Few more details on ATS-5/Mission #hpc to be installed at LANL. Confirmed as next-gen Cray GX5000 with Vera Rubin + XDR InfiniBand. Confirms that GX5000 can do both InfiniBand and Slingshot++.

The messaging is funny; it "will build on the success" of LANL's Venado (non-ATS; GH200) system" but […]
Original post on mast.hpc.social
mast.hpc.social
October 29, 2025 at 6:04 PM
Kinda funny that OpenAI owns less of OpenAI than Microsoft.

OpenAI (the nonprofit) holds a $130B equity stake in OpenAI (the public benefit corporation) while Microsoft holds $135B.

https://openai.com/index/built-to-benefit-everyone/
October 29, 2025 at 1:47 AM
NVIDIA announced seven new DOE #hpc systems for #ai, including a 100K Blackwell system at Argonne. Oracle is a partner, just like with yesterday's OLCF-6/Discovery announcement.

Details on the ALCF systems are scant. This procurement was out of band of the Aurora follow-on.

The LANL systems […]
Original post on mast.hpc.social
mast.hpc.social
October 28, 2025 at 6:23 PM
Article implies AMD is partly paying DOE for the #hpc systems being deployed at ORNL: “The Department of Energy will host the computers, the companies will provide the machines and capital spending”

Is AMD buying market share, similar to its arrangement with OpenAI? […]
Original post on mast.hpc.social
mast.hpc.social
October 28, 2025 at 12:02 AM
Glad to see DAOS getting another round of DOE investment.

Highlights a diverging gap between government AI infra and commercial AI infra. I have never met an AI customer who's found relevance in DAOS. Maybe .gov can change that […]
Original post on mast.hpc.social
mast.hpc.social
October 27, 2025 at 11:24 PM
I published my first (of many) technical blogs for VAST. This one gives a quantitative, real-world perspective on how much checkpoint bandwidth is required to train trillion-parameter-scale models (hint: less than many have suggested) […]
Original post on mast.hpc.social
mast.hpc.social
October 23, 2025 at 1:23 AM
KV caching for LLM training is just a specialized memoization problem which is not uncommon across #hpc, and its implementation remains pretty primitive but is optimizing quickly. Here's an example of scavenging "perforated" results of memoized attention layers […]
Original post on mast.hpc.social
mast.hpc.social
October 21, 2025 at 5:06 PM
Love to see DAOS getting mainstream attention, but it's long road ahead before it can compete in the GPU cloud market. Everyone says their performance is the best; that doesn't differentiate storage for #ai. It's the hard problems - integration, reliability, etc […]
Original post on mast.hpc.social
mast.hpc.social
October 16, 2025 at 6:23 PM
I'll be talking about how hyperscale #ai resembles #hpc workflows at an #sc25 Exhibitor Forum session on Thurs 11/20.

Excited to be on stage at SC again. Aiming to maintain the same level of technical quality as before I left .gov.

Details […]
Original post on mast.hpc.social
mast.hpc.social
October 16, 2025 at 12:28 AM