Yi (Joshua) Ren
joshuaren.bsky.social
Yi (Joshua) Ren
@joshuaren.bsky.social
Ph.D. student @cs.ubc.ca, working on ML (learning dynamics, simplicity bias, iterated learning, LLM) https://joshua-ren.github.io/
With this setup, we can now explain some strange behaviors in DPO, like why the model's confidence on both the chosen and rejected answers drops after long training. 📉📉
Just apply force analysis and remember: the smaller p(y-), the stronger the squeezing effect.
(10/12)
April 21, 2025 at 5:45 AM
Just like this!!!
(9/12)
April 21, 2025 at 5:45 AM
Time to see how learning dynamics explains those weird behaviors. We observe a consistent trend: similar responses often rise in confidence, then fall.
📈📉 This aligns well with the force analysis perspective. (More supporting experiments in the paper).
(5/12)
April 21, 2025 at 5:45 AM