Ethan
banner
ethansmith2000.com
Ethan
@ethansmith2000.com
a boy and his gpu vs the world. cofounder/directing research at
@leonardoai_. (now at @canva) trying to feel the magic.
www.ethansmith2000.com
this is so cool
December 1, 2024 at 4:40 AM
it's crazy to me that RoPE's issue with BF16 wasn't noticed earlier.
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.
November 28, 2024 at 12:09 PM
This is the big one and I can’t stress this enough. All of your data everywhere is being gathered and used anyway by private actors. The only fire you can fight back with is to play on that same field and democratize it. This anger is way mistargeted
Most importantly the mission: empower people against big AI monopolies, be transparent about it and implications.
Large companies are training models and gating them and making you pay per use with no transparency on biases, fairness of the model or societal/environmental impact.
November 28, 2024 at 9:41 AM
November 28, 2024 at 9:35 AM
Just added FSDP2 support for MARS and Muon!
November 25, 2024 at 10:39 PM
Reposted by Ethan
Excellent writeup on GPU streams / CUDA memory
dev-discuss.pytorch.org/t/fsdp-cudac...

TLDR by default mem is proper to a stream, to share it::
- `Tensor.record_stream` -> automatic, but can be suboptimal and nondeterministic
- `Stream.wait` -> manual, but precise control
November 24, 2024 at 10:04 PM
Incredible to see what is likely SOTA results coming out of open source with full reproducibility!
Happy to have helped provide the compute for this and hoping to support more awesome research like this!
New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks
November 25, 2024 at 2:50 AM
Reposted by Ethan
First, my sincerest thanks to @leonardoai.bsky.social with the help of
@ethansmith2000.com for generously providing H100s to support this research to enable this release. Y'all rock, thanks so much! <3
November 25, 2024 at 1:59 AM
Reposted by Ethan
New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks
November 25, 2024 at 1:53 AM
Reposted by Ethan
i trying to follow as many of my old moots as possible and new people as i find them. some of y'all changing your pfp is just mean spirited (im lazy and learned people's pfps not names)
November 24, 2024 at 5:08 PM
Reposted by Ethan
Untuned SOAP beats tuned adamw at ever single step
ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.
November 25, 2024 at 12:08 AM
ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.
November 24, 2024 at 11:36 PM
I goofed and never tested distributed saving, but now it works!
It was a little annoying as both SOAP and psgd maintain preconds as lists of varying size, which fail to be pickled. To fix this I hardcoded there to be a max of 4 (based on conv layers being 4d tensors).
Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...
November 24, 2024 at 8:35 PM
I’ve generally preferred research to software engineering but I am growing a liking for building the tools used for research
November 24, 2024 at 10:50 AM
Reposted by Ethan
Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...
November 23, 2024 at 4:02 PM
Reposted by Ethan
PSA: force-pushing to GitHub does NOT actually delete/overwrite anything. Everything you squash and force-push over is still there:
November 23, 2024 at 10:16 PM
Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...
November 23, 2024 at 4:02 PM
from a 2012 lecture referencing a 1981 paper (and it all really goes further back than that)
If ML experimentation software was as widespreadly accessible as it is now, I'd be generative diffusion models would have been discovered a while ago.
November 23, 2024 at 10:09 AM
probably the best in-depth explanation i've seen on FSDP at the most granular levels, props to the authors
dev-discuss.pytorch.org/t/fsdp-cudac...
November 23, 2024 at 5:02 AM
Reposted by Ethan
🥳 You can wishlist my #MR game Galactic Traffic Control NOW on Meta Quest! 🚀☄️🕹️ It's coming soon :D

GTC is about guiding spaceships from your walls to their portals by drawing paths and destroying obstacles

meta.com/experiences/...

Let's talk about the features... 🧵
#gamedev #vr #xr #MetaQuest
November 22, 2024 at 11:14 PM
Besides X having a longer existing community, I can’t imagine any reason you wouldn’t choose Bluesky over X. Your experience being at the mercy of algo-makers is so tired.
November 21, 2024 at 10:37 AM