I wish there was code for velo and this that actually worked
arxiv.org/abs/2209.11208
I wish there was code for velo and this that actually worked
arxiv.org/abs/2209.11208
bsky.app/profile/hess...
MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
bsky.app/profile/hess...
MARS is a new exciting variance reduction technique from @quanquangu.bsky.social 's group which can help stabilize and accelerate your deep learning pipeline. All that is needed is a gradient buffer. Here MARS speeds up the convergence of PSGD ultimately leading to a better solution.
github.com/evanatyourse...
github.com/evanatyourse...