Lightnews — Scholar-powered news

I’m watching Chopin Competition.
I’m so surprised that he is using an office chair while I’m deeply impressed by his performance.

m.youtube.com/watch?v=fDsg...

ERIC LU – first round (19th Chopin Competition, Warsaw)

YouTube video by Chopin Institute

m.youtube.com

October 21, 2025 at 12:52 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

IBIS，発表は間に合わなかったのですが，参加はすることにしました😀懇親会はもちろん間に合わなかったのですが，裏懇親会に申し込みました．よろしくお願いします🙇

October 20, 2025 at 6:28 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

1つのライブラリ（torch, trl）に慣れてしまうと，例え他のライブラリ（jax, verl）の方が適していたとしても，移行する際の心理的な障壁がとても高い．プログラミング言語間の翻訳に特化したLLMが一番欲しい…

October 19, 2025 at 6:27 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

Practical upper-bound is an interesting concepts. What kind of practical upper bound would be interesting other than this?
arxiv.org/abs/2510.09378

The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

Recent efforts to accelerate LLM pretraining have focused on computationally-efficient approximations that exploit second-order structure. This raises a key question for large-scale training: how much...

arxiv.org

October 15, 2025 at 1:04 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

Muon，論文にする場合は，早めに書かないと，また誰かと被る可能性がある一方で，何かあとひと押しのオリジナリティが出せない…

October 13, 2025 at 11:52 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

While the focus for generalization and implicit bias has been on robustness to sample-wise noise, the rise of large-scale models suggests that robustness to parameter-wise noise (e.g., from quantization) might be now just as important?

x.com/deepcohen/st...

Jeremy Cohen on X: "This nice, thorough paper on LLM pretraining shows that quantization error rises sharply when the learning rate is decayed. But, why would that be? The answer is likely related to curvature dynamics. https://t.co/cdkt3DU1iw" / X

This nice, thorough paper on LLM pretraining shows that quantization error rises sharply when the learning rate is decayed. But, why would that be? The answer is likely related to curvature dynamics. https://t.co/cdkt3DU1iw

x.com

October 13, 2025 at 5:40 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

I’ve been challenging myself to read a lot of NeurIPS 2025 papers, but maybe I should switch soon to reading ICLR 2025 submissions instead.

October 10, 2025 at 1:11 PM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

This paper is really interesting.
NGD builds curvature from the function gradient df/dw, while optimizers like Adam and Shampoo use the loss gradient dL/dw.
I’ve always wondered which is better, since using the loss gradient with EMA might cause loss spikes later in training.

October 10, 2025 at 12:46 PM

Reposted by Satoki Ishikawa

Han Bao

@han-b.bsky.social

This paper studies why Adam occasionally causes loss spikes, which is attributed to the edge of stability phenomenon. As seen from the figure, once hitting EOS (see b) a loss spike is triggered. An interesting experimental report!

arxiv.org/abs/2506.04805

October 10, 2025 at 7:55 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

I'm looking at ICLR submissions and I've noticed a significant number of papers related to Muon.

October 10, 2025 at 3:42 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

学習ダイナミクスや暗黙的バイアスの観点から嬉しいスパース構造と，GPUを用いた行列積にとって嬉しいスパース構造と，脳が持っているスパース構造，どのくらいオーバーラップがあるのですかね🤔GPUを用いた行列積にとって嬉しいスパース行列の構造は複数パターン知られていますが，その学習理論や神経科学との接続はあまり聞かず，ただHPCの人も他分野に興味があるので，関連しそうな文献に引用は飛ばしつつも，あと一歩で行き詰まっている印象？

October 8, 2025 at 7:29 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

I’m watching Chopin Competition.
I’m so surprised that he is using an office chair while I’m deeply impressed by his performance.

m.youtube.com/watch?v=fDsg...

ERIC LU – first round (19th Chopin Competition, Warsaw)

YouTube video by Chopin Institute

m.youtube.com

October 6, 2025 at 7:27 PM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

二つの全く違うところでやっていた違うテーマ両方とオーバーラップが大きい研究が出てくると，精神的ダメージが大きいのですが，テーマ設定が安易すぎたのかもしれない

Satoki Ishikawa @satoki-ishikawa.bsky.social · Oct 2

ここ2~3ヶ月考えていた２つのテーマに極めて近い論文が，昨日同時に２本出てしまい，研究テーマが消滅してしまった．．．早めに切り替えて，いい研究テーマをゼロから考え直さないといけないですね．．．

October 4, 2025 at 1:21 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

ここ2~3ヶ月考えていた２つのテーマに極めて近い論文が，昨日同時に２本出てしまい，研究テーマが消滅してしまった．．．早めに切り替えて，いい研究テーマをゼロから考え直さないといけないですね．．．

October 2, 2025 at 6:06 PM

Reposted by Satoki Ishikawa

Francis Bach

@bachfrancis.bsky.social

Not all scaling laws are nice power laws. This month’s blog post: Zipf’s law in next-token prediction and why Adam (ok, sign descent) scales better to large vocab sizes than gradient descent: francisbach.com/scaling-laws...

September 27, 2025 at 2:57 PM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

I've made some small updates to the 'awesome list' for second-order optimization I made two years ago. It looks like Muon related works and the applications to PINNs have really taken off in the last couple of years.
github.com/riverstone49...

September 26, 2025 at 12:18 PM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

I don’t know anything about fluid dynamics, but I came across a paper that seemed to say that second-order optimization is key when using the power of neural networks to solve the Navier–Stokes equations. If so, there’s something romantic about that.
arxiv.org/abs/2509.14185

Discovery of Unstable Singularities

Whether singularities can form in fluids remains a foundational unanswered question in mathematics. This phenomenon occurs when solutions to governing equations, such as the 3D Euler equations, develo...

arxiv.org

September 23, 2025 at 2:13 AM

Reposted by Satoki Ishikawa

Pierre Alquier

@pierrealquier.bsky.social

This is not OK.

I don't submit often to NeurIPS, but I reviewed papers for this conference almost every year. As a reviewer, why would I spend time trying to give a fair opinion on papers if it's what happens in the end???

Noa Garcia @noagarciad.bsky.social · Sep 18

This is the metareview

September 20, 2025 at 6:10 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

ACT-Xに採択されました．引き続き，ニューラルネットワークの最適化について，深い理解を得られるよう研究していきます😁
www.jst.go.jp/kisoken/act-...

2025年度　戦略的創造研究推進事業（ACT-X）の新規研究課題及び評価者について | ACT-X

www.jst.go.jp

September 18, 2025 at 5:43 AM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

When a paper has more than 40 figures, I can really feel the author’s dedication just by looking at it - it’s energizing
arxiv.org/abs/2509.01440

Benchmarking Optimizers for Large Language Model Pretraining

The recent development of Large Language Models (LLMs) has been accompanied by an effervescence of novel ideas and methods to better optimize the loss of deep learning models. Claims from those method...

arxiv.org

September 3, 2025 at 5:05 PM

Satoki Ishikawa

@satoki-ishikawa.bsky.social

私が情報系の進路を選んだ一番のきっかけは，「甘利先生の『神経回路網の数理』を現代的なモデルと大規模計算機で実験検証をしてみたい」なのですが，「神経回路網の数理」を久しぶりに開いて読んでみたら，今ちょうどやっていたモデルの簡約化＆解析と実質同じことが書かれていたことに気がつき驚いている．

August 31, 2025 at 5:54 AM

Reposted by Satoki Ishikawa

Han Bao

@han-b.bsky.social

Today I learned that the continuous-time limit of Nesterov's accelerated gradient is Bessel's differential equation, which can be solved analytically. That's unexpectedly a beautiful result to me...

web.stanford.edu/~boyd/papers...

August 28, 2025 at 2:20 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news