Weijie Su
wjsu.bsky.social
Weijie Su
@wjsu.bsky.social
Associate Professor at University of Pennsylvania
Another new paper that is follow-up:

arxiv.org/abs/2505.20627

It studies an alternative to RLHF: Nash learning from human feedback.
Fundamental Limits of Game-Theoretic LLM Alignment: Smith Consistency and Preference Matching
Nash Learning from Human Feedback is a game-theoretic framework for aligning large language models (LLMs) with human preferences by modeling learning as a two-player zero-sum game. However, using raw ...
arxiv.org
May 30, 2025 at 4:00 PM
Our analysis shows that it is natural to use the polar decomposition from a defining viewpoint. This gives rise to nuclear norm scaling: the update will vanish as the gradient becomes small, automatically! In contrast, Muon needs to manually tune the factor for the ortho matrix to achieve this.
May 29, 2025 at 5:13 PM
Statistical Foundations of Large Language Models
www.weijie-su.com
May 29, 2025 at 1:17 PM
The ranking method was tested at ICML in 2023, 2024, and 2025. I hope we'll finally use it to improve ML/AI review processes soon. Here's an article about the method, from its conception to experimentation:

www.weijie-su.com/openrank/
How to Prevent a Tragedy of the Commons for AI Research?
www.weijie-su.com
May 27, 2025 at 5:08 PM
Add me plz. Thx!
November 28, 2024 at 1:29 AM