Lightnews — Scholar-powered news

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

It was tradition at NERSC for the director to give everyone a half-day off on the Wednesday before Thanksgiving. By comparison, VAST has no company holidays, so technically, nobody gets Thanksgiving off (much less the half day before it!)

November 26, 2025 at 9:30 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

US DOE is being directed to go all-in on AI: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/

#hpc

November 24, 2025 at 11:21 PM

Reposted by Glenn K. Lockwood

Andreas Herten

@andih.mastodon.social.ap.brid.gy

Finally, #jupiter crossed the 1 ExaFLOP/s threshold today. The list is lying to you, though, it's not like it's 1000 PFLOP/s exactly, it's 1000.184 PFLOP/s; the rest got lost to rounding.
The 184 TFLOP/s are pretty much exactly the same as the previous #jsc […]

[Original post on mastodon.social]

Output from the golden HPL run, which show 43.6 GFLOP/s as the per-GPU performance

November 17, 2025 at 8:24 PM

Reposted by Glenn K. Lockwood

Alan Sill

@alansill.mast.hpc.social.ap.brid.gy

@glennklockwood and @willkill07 in the house! #sc25 #hpc

Photo of Will Killain, Glen Lockwood,a nd Alan Sill at the Texas Tech booth at SUpercomputing 2025.

November 20, 2025 at 5:42 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Andreas Dilger is now working for The Lustre Collective (https://thelustrecollective.com) after leaving DDN. I am glad to see his leadership continue to drive Lustre into the future. Say what you will about it, Lustre is the standard to which every other #hpc file system is compared.

#sc25

November 18, 2025 at 11:51 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

When I spent a Thanksgiving week after SC writing the non-MPI layer for Darshan years ago, I thought to myself “surely this work will make me famous!”

I guess my ship finally came in at the PDSW keynote by Rob Ross.

#sc25

November 17, 2025 at 3:27 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Apparently I was the first DAOS user to complain about having to refer to DAOS containers by UUIDs, so they added container labels. Don’t know if this is completely true, but I remember voicing this and will accept the credit if Mohamad is willing to give it to me 🙂

(Learned this at my own […]

Original post on mast.hpc.social

mast.hpc.social

November 16, 2025 at 4:43 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

New SC record: ran into a colleague within 2 minutes of walking into the airport terminal from the curb. Been catching up nonstop straight through takeoff. Conference starts earlier and earlier every year.

November 15, 2025 at 6:27 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Chatting with a pal reminded me of a fun pre-SC activity: looking back at old conference takes that aged like milk. Remember this one?

#hpc #zettascale #hedoesntworkthereanymore

November 14, 2025 at 10:40 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

🎶 It's the most wonderful time of the year 🎶

#sc25

November 14, 2025 at 8:22 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

VAST and CoreWeave just announced a >$1.1 billion partnership to deliver #ai data services. Mind you, that's a billion in services, not GPUs. Though I can't claim any credit, I'm proud to work for a company that's earned this level of trust from a partner […]

Original post on mast.hpc.social

mast.hpc.social

November 7, 2025 at 12:13 AM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

SC25 will be my 12th SC (10th in-person). I've attended and presented on behalf of SDSC, NERSC, Microsoft before, but I've got to say: this year has been the most work and most stress I've ever had around the conference.

November 5, 2025 at 7:23 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Google recently posted a promo for using their managed #lustre service to accelerate inferencing via KV caching. Raises questions:

1. What ever happened to Google Managed #daos (ParallelStore)? It performs better than Lustre.

2. Does Gemini use this? Unlikely. See […]

Original post on mast.hpc.social

mast.hpc.social

November 4, 2025 at 4:41 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

This is like showing up with a new boyfriend the week after the divorce. At least Microsoft is still getting those alimony payments.

https://www.aboutamazon.com/news/aws/aws-open-ai-workloads-compute-infrastructure

November 3, 2025 at 4:11 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

NVIDIA, Oracle, and US DOE are named in the headline. Argonne is not. I don’t think this is an Argonne system.

https://nvidianews.nvidia.com/news/nvidia-oracle-us-department-of-energy-ai-supercomputer-scientific-discovery

October 30, 2025 at 3:33 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

"AMD Powers U.S. Sovereign AI Factory Supercomputers" - What exactly is "US sovereign AI?" I think the whole point of "sovereign AI" is "not dependent on the USA." All AI is, by default, sovereign to the US […]

Original post on mast.hpc.social

mast.hpc.social

October 30, 2025 at 2:32 AM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Few more details on ATS-5/Mission #hpc to be installed at LANL. Confirmed as next-gen Cray GX5000 with Vera Rubin + XDR InfiniBand. Confirms that GX5000 can do both InfiniBand and Slingshot++.

The messaging is funny; it "will build on the success" of LANL's Venado (non-ATS; GH200) system" but […]

Original post on mast.hpc.social

mast.hpc.social

October 29, 2025 at 6:04 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Kinda funny that OpenAI owns less of OpenAI than Microsoft.

OpenAI (the nonprofit) holds a $130B equity stake in OpenAI (the public benefit corporation) while Microsoft holds $135B.

https://openai.com/index/built-to-benefit-everyone/

October 29, 2025 at 1:47 AM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

NVIDIA announced seven new DOE #hpc systems for #ai, including a 100K Blackwell system at Argonne. Oracle is a partner, just like with yesterday's OLCF-6/Discovery announcement.

Details on the ALCF systems are scant. This procurement was out of band of the Aurora follow-on.

The LANL systems […]

Original post on mast.hpc.social

mast.hpc.social

October 28, 2025 at 6:23 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Article implies AMD is partly paying DOE for the #hpc systems being deployed at ORNL: “The Department of Energy will host the computers, the companies will provide the machines and capital spending”

Is AMD buying market share, similar to its arrangement with OpenAI? […]

Original post on mast.hpc.social

mast.hpc.social

October 28, 2025 at 12:02 AM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Glad to see DAOS getting another round of DOE investment.

Highlights a diverging gap between government AI infra and commercial AI infra. I have never met an AI customer who's found relevance in DAOS. Maybe .gov can change that […]

Original post on mast.hpc.social

mast.hpc.social

October 27, 2025 at 11:24 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

I published my first (of many) technical blogs for VAST. This one gives a quantitative, real-world perspective on how much checkpoint bandwidth is required to train trillion-parameter-scale models (hint: less than many have suggested) […]

Original post on mast.hpc.social

mast.hpc.social

October 23, 2025 at 1:23 AM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

KV caching for LLM training is just a specialized memoization problem which is not uncommon across #hpc, and its implementation remains pretty primitive but is optimizing quickly. Here's an example of scavenging "perforated" results of memoized attention layers […]

Original post on mast.hpc.social

mast.hpc.social

October 21, 2025 at 5:06 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

Love to see DAOS getting mainstream attention, but it's long road ahead before it can compete in the GPU cloud market. Everyone says their performance is the best; that doesn't differentiate storage for #ai. It's the hard problems - integration, reliability, etc […]

Original post on mast.hpc.social

mast.hpc.social

October 16, 2025 at 6:23 PM

Glenn K. Lockwood

@glennklockwood.mast.hpc.social.ap.brid.gy

I'll be talking about how hyperscale #ai resembles #hpc workflows at an #sc25 Exhibitor Forum session on Thurs 11/20.

Excited to be on stage at SC again. Aiming to maintain the same level of technical quality as before I left .gov.

Details […]

Original post on mast.hpc.social

mast.hpc.social

October 16, 2025 at 12:28 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news