Lightnews — Scholar-powered news

Ethan

@ethansmith2000.com

this is so cool

December 1, 2024 at 4:40 AM

Ethan

@ethansmith2000.com

it's crazy to me that RoPE's issue with BF16 wasn't noticed earlier.
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.

November 28, 2024 at 12:09 PM

Ethan

@ethansmith2000.com

This is the big one and I can’t stress this enough. All of your data everywhere is being gathered and used anyway by private actors. The only fire you can fight back with is to play on that same field and democratize it. This anger is way mistargeted

merve @merve.bsky.social · Nov 27

Most importantly the mission: empower people against big AI monopolies, be transparent about it and implications.
Large companies are training models and gating them and making you pay per use with no transparency on biases, fairness of the model or societal/environmental impact.

November 28, 2024 at 9:41 AM

Ethan

@ethansmith2000.com

November 28, 2024 at 9:35 AM

Ethan

@ethansmith2000.com

Just added FSDP2 support for MARS and Muon!

November 25, 2024 at 10:39 PM

Reposted by Ethan

TimDarcet

@timdarcet.bsky.social

Excellent writeup on GPU streams / CUDA memory
dev-discuss.pytorch.org/t/fsdp-cudac...

TLDR by default mem is proper to a stream, to share it::
- `Tensor.record_stream` -> automatic, but can be suboptimal and nondeterministic
- `Stream.wait` -> manual, but precise control

November 24, 2024 at 10:04 PM

Ethan

@ethansmith2000.com

Incredible to see what is likely SOTA results coming out of open source with full reproducibility!
Happy to have helped provide the compute for this and hoping to support more awesome research like this!

Fern @fernbear.bsky.social · Nov 25

New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks

November 25, 2024 at 2:50 AM

Reposted by Ethan

Fern

@fernbear.bsky.social

First, my sincerest thanks to @leonardoai.bsky.social with the help of
@ethansmith2000.com for generously providing H100s to support this research to enable this release. Y'all rock, thanks so much! <3

November 25, 2024 at 1:59 AM

Reposted by Ethan

Fern

@fernbear.bsky.social

New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks

November 25, 2024 at 1:53 AM

Reposted by Ethan

xjdr

@xjdr.bsky.social

i trying to follow as many of my old moots as possible and new people as i find them. some of y'all changing your pfp is just mean spirited (im lazy and learned people's pfps not names)

November 24, 2024 at 5:08 PM

Reposted by Ethan

Rosmine

@rosmineb.bsky.social

Untuned SOAP beats tuned adamw at ever single step

Ethan @ethansmith2000.com · Nov 24

ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.

November 25, 2024 at 12:08 AM

Ethan

@ethansmith2000.com

ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.

November 24, 2024 at 11:36 PM

Ethan

@ethansmith2000.com

I goofed and never tested distributed saving, but now it works!
It was a little annoying as both SOAP and psgd maintain preconds as lists of varying size, which fail to be pickled. To fix this I hardcoded there to be a max of 4 (based on conv layers being 4d tensors).

Ethan @ethansmith2000.com · Nov 23

Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...

November 24, 2024 at 8:35 PM

Ethan

@ethansmith2000.com

I’ve generally preferred research to software engineering but I am growing a liking for building the tools used for research

November 24, 2024 at 10:50 AM

Reposted by Ethan

Ethan

@ethansmith2000.com

Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...

November 23, 2024 at 4:02 PM

Reposted by Ethan

Lucas Beyer (bl16)

@giffmana.ai

PSA: force-pushing to GitHub does NOT actually delete/overwrite anything. Everything you squash and force-push over is still there:

November 23, 2024 at 10:16 PM

Ethan

@ethansmith2000.com

Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...

November 23, 2024 at 4:02 PM

Ethan

@ethansmith2000.com

from a 2012 lecture referencing a 1981 paper (and it all really goes further back than that)
If ML experimentation software was as widespreadly accessible as it is now, I'd be generative diffusion models would have been discovered a while ago.

November 23, 2024 at 10:09 AM

Ethan

@ethansmith2000.com

probably the best in-depth explanation i've seen on FSDP at the most granular levels, props to the authors
dev-discuss.pytorch.org/t/fsdp-cudac...

November 23, 2024 at 5:02 AM

Reposted by Ethan

Juan Lam

@juanlam.com

🥳 You can wishlist my #MR game Galactic Traffic Control NOW on Meta Quest! 🚀☄️🕹️ It's coming soon :D

GTC is about guiding spaceships from your walls to their portals by drawing paths and destroying obstacles

meta.com/experiences/...

Let's talk about the features... 🧵
#gamedev #vr #xr #MetaQuest

November 22, 2024 at 11:14 PM

Ethan

@ethansmith2000.com

Besides X having a longer existing community, I can’t imagine any reason you wouldn’t choose Bluesky over X. Your experience being at the mercy of algo-makers is so tired.

November 21, 2024 at 10:37 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news