Ethan
banner
ethansmith2000.com
Ethan
@ethansmith2000.com
a boy and his gpu vs the world. cofounder/directing research at
@leonardoai_. (now at @canva) trying to feel the magic.
www.ethansmith2000.com
this is so cool
December 1, 2024 at 4:40 AM
it's crazy to me that RoPE's issue with BF16 wasn't noticed earlier.
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.
November 28, 2024 at 12:09 PM
November 28, 2024 at 9:35 AM
Just added FSDP2 support for MARS and Muon!
November 25, 2024 at 10:39 PM
November 25, 2024 at 4:01 PM
ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.
November 24, 2024 at 11:36 PM
Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...
November 23, 2024 at 4:02 PM
from a 2012 lecture referencing a 1981 paper (and it all really goes further back than that)
If ML experimentation software was as widespreadly accessible as it is now, I'd be generative diffusion models would have been discovered a while ago.
November 23, 2024 at 10:09 AM
probably the best in-depth explanation i've seen on FSDP at the most granular levels, props to the authors
dev-discuss.pytorch.org/t/fsdp-cudac...
November 23, 2024 at 5:02 AM