Omead Pooladzandi ✈️ NeurIPS'24
banner
hessianfree.bsky.social
Omead Pooladzandi ✈️ NeurIPS'24
@hessianfree.bsky.social
Optimization Generative Modeling @Caltech, PhD @UCLA. ex Research Scientist Intern @AIatMeta (opinions are my own) why is jax so difficult
Hello world
February 26, 2025 at 9:08 PM
Newton-Schulz isn't the answer even for instantaneous whitening.

PSGD: MSE( Q.T Q H , I ) = 5.2e-3
Zero-Power NS 100 iterations: MSE( NS(G) , I ) = 8.2e-1
True Inverse: MSE( H^(-1/2) H H^(-1/2), I ) = 6.1e-3

PSGD whitens information significantly better than the Newton-Schulz iters found in Muon
December 27, 2024 at 9:24 AM
Xilin is back at it again. Results are clear: damping hurts precision, but lower precision needs it if the underlying Hessian is extremely poorly conditioned.
December 7, 2024 at 4:46 PM
PSGD tracking Muon on modded nanoGPT
December 2, 2024 at 5:30 AM
Who is going to NeurIPS?
November 30, 2024 at 5:55 PM
Lol AI stats reviews consisted of one 5 rating: Top 10% of accepted papers with a confidence or 5 - absolutely certain. The reviewer raved and ranted about how good PSGD.

And two confident 4 rejects with a score of 1. And one borderline reject with a confidence of 4.
November 28, 2024 at 8:17 PM
Reposted by Omead Pooladzandi ✈️ NeurIPS'24
SmolVLM was just released 🚀

It's a great, small, and fully open VLM that I'm really excited about for fine-tuning and on-device use cases 💻

It also comes with 0-day MLX support via mlx-vlm, here's it running at > 80 tok/s on my M1 Max 🤯
November 26, 2024 at 4:36 PM
Reposted by Omead Pooladzandi ✈️ NeurIPS'24
Just put together a starter pack for Deep Learning Theory. Let me know if you'd like to be included or suggest someone to add to the list!

go.bsky.app/2qnppia
November 22, 2024 at 9:35 PM
PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
November 26, 2024 at 4:21 AM
Oftentimes PSGD will be slow to close plasticity resulting in slightly slower convergence but ultimately a better solution.
November 25, 2024 at 12:34 AM
Okayyy I should actually start posting about PSGD here
November 24, 2024 at 7:24 PM
Hello World!
November 24, 2024 at 4:35 PM
Reposted by Omead Pooladzandi ✈️ NeurIPS'24
Radon Transform (RT) was formulated in 1917 but remained useless in practice until CT scanners were invented in the 60s

But RT isn't just for CTs. It's a sort of generalization of marginals in probability

RT g(p,θ): Shoot rays at θ+90 & offset p, measure line integrals of f(x,y) along the ray

1/n
November 24, 2024 at 12:33 AM
Reposted by Omead Pooladzandi ✈️ NeurIPS'24
Starter packs are helpful as well as the twitter import tool chromewebstore.google.com/detail/sky-f...
Sky Follower Bridge - Chrome Web Store
Instantly find and follow the same users from your Twitter follows on Bluesky.
chromewebstore.google.com
November 23, 2024 at 8:36 PM
Just some light reading
November 24, 2024 at 4:40 AM
Reposted by Omead Pooladzandi ✈️ NeurIPS'24
Here, have PSGD-Kron and SOAP with FSDP2 support. Please go wild with it, let's see something finally replace ADAM.
github.com/ethansmith20...
November 23, 2024 at 4:02 PM
Reposted by Omead Pooladzandi ✈️ NeurIPS'24
probably the best in-depth explanation i've seen on FSDP at the most granular levels, props to the authors
dev-discuss.pytorch.org/t/fsdp-cudac...
November 23, 2024 at 5:02 AM