Looking for great collabolation research
https://riverstone496.github.io/
www.arxiv.org/abs/2505.24333
www.arxiv.org/abs/2505.24333
I’m so surprised that he is using an office chair while I’m deeply impressed by his performance.
m.youtube.com/watch?v=fDsg...
arxiv.org/abs/2510.09378
arxiv.org/abs/2510.09378
x.com/deepcohen/st...
x.com/deepcohen/st...
NGD builds curvature from the function gradient df/dw, while optimizers like Adam and Shampoo use the loss gradient dL/dw.
I’ve always wondered which is better, since using the loss gradient with EMA might cause loss spikes later in training.
NGD builds curvature from the function gradient df/dw, while optimizers like Adam and Shampoo use the loss gradient dL/dw.
I’ve always wondered which is better, since using the loss gradient with EMA might cause loss spikes later in training.
arxiv.org/abs/2506.04805
arxiv.org/abs/2506.04805
I’m so surprised that he is using an office chair while I’m deeply impressed by his performance.
m.youtube.com/watch?v=fDsg...
I’m so surprised that he is using an office chair while I’m deeply impressed by his performance.
m.youtube.com/watch?v=fDsg...
github.com/riverstone49...
github.com/riverstone49...
arxiv.org/abs/2509.14185
arxiv.org/abs/2509.14185
I don't submit often to NeurIPS, but I reviewed papers for this conference almost every year. As a reviewer, why would I spend time trying to give a fair opinion on papers if it's what happens in the end???
I don't submit often to NeurIPS, but I reviewed papers for this conference almost every year. As a reviewer, why would I spend time trying to give a fair opinion on papers if it's what happens in the end???
www.jst.go.jp/kisoken/act-...
www.jst.go.jp/kisoken/act-...
arxiv.org/abs/2509.01440
arxiv.org/abs/2509.01440
web.stanford.edu/~boyd/papers...
web.stanford.edu/~boyd/papers...