Daniil Tiapkin
banner
dtiapkin.bsky.social
Daniil Tiapkin
@dtiapkin.bsky.social
PhD student at École polytechnique and Université Paris-Saclay 🇫🇷

Reinforcement Learning enjoyer, sometimes even with human feedback

Ex. student-researcher at Google DeepMind Paris

🌐 https://d-tiapkin.github.io/
3/ The key intuition is that distillation optimizes a proxy objective since the teacher isn’t perfect, exactly like RLHF optimizes an imperfect reward. To study this, we built a controlled setup in which an oracle model replaced the ground-truth objective.
February 7, 2025 at 7:11 PM
1/ If you’re familiar with RLHF, you likely heard of reward hacking—where over-optimizing the imperfect reward model leads to unintended behaviors. But what about teacher hacking in knowledge distillation: can the teacher be hacked, like rewards in RLHF?
February 7, 2025 at 7:11 PM