Omead Pooladzandi ✈️ NeurIPS'24
banner
hessianfree.bsky.social
Omead Pooladzandi ✈️ NeurIPS'24
@hessianfree.bsky.social
Optimization Generative Modeling @Caltech, PhD @UCLA. ex Research Scientist Intern @AIatMeta (opinions are my own) why is jax so difficult
Be in touch!
November 30, 2024 at 6:12 PM
Cheers !
November 30, 2024 at 1:14 AM
Maybe just a skill issue but I couldn't get the darn thing to run. I wanted to whiten the gradients before giving them to velo to see how effective performance. Would you be interested in helping me with a few experiments?
November 29, 2024 at 7:52 AM
I was just looking and seems there is also this..
I wish there was code for velo and this that actually worked
arxiv.org/abs/2209.11208
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
Learned optimizers -- neural networks that are trained to act as optimizers -- have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across...
arxiv.org
November 29, 2024 at 6:56 AM
Totally agree!
November 29, 2024 at 5:28 AM
The rejects were horribly misinformed self contradictory but extremely confident. PSGD, SOAP and friends are taking over regardless of academia.
November 28, 2024 at 8:17 PM
Here was a post I made showing MARS actually helping initial convergence of PSGD. I believe this is happening because MARS is reducing the variance of the gradients which here resulted in a bit faster convergence. But it is unclear how this effects PSGD later in training!

bsky.app/profile/hess...
PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
November 28, 2024 at 2:16 AM
Hi @clementpoiret.bsky.social I am one of the co-authors of PSGD from 2022, and actively working on PSGD Kron with Xilin and @evanatyourservice.bsky.social glad you are excited about PSGD Kron!
PSGD ❤️ MARS

MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
November 28, 2024 at 2:16 AM
PSGD is a bit orthogonal to MARS and such MARS can be easily adopted. Here is a branch that has them combined.

github.com/evanatyourse...
kron_torch/kron_torch/kron.py at mars · evanatyourservice/kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch - evanatyourservice/kron_torch
github.com
November 28, 2024 at 2:12 AM
November 26, 2024 at 4:08 AM
Bro pfp change messes w me so much.
November 25, 2024 at 12:37 AM
We are learning the curvature. It can take some time. You can get it to converge faster if you increase the LR of the curvature fitting.
November 25, 2024 at 12:13 AM