Daniil Tiapkin
banner
dtiapkin.bsky.social
Daniil Tiapkin
@dtiapkin.bsky.social
PhD student at École polytechnique and Université Paris-Saclay 🇫🇷

Reinforcement Learning enjoyer, sometimes even with human feedback

Ex. student-researcher at Google DeepMind Paris

🌐 https://d-tiapkin.github.io/
6/ Our paper is out: arxiv.org/abs/2502.02671. This work was the result of my internship at Google DeepMind—huge thanks to the team: Daniele Calandriello, Johan Ferret, Sarah Perrin, Nino Vieillard, @ramealexandre.bsky.social, @mblondel.bsky.social!
On Teacher Hacking in Language Model Distillation
Post-training of language models (LMs) increasingly relies on the following two stages: (i) knowledge distillation, where the LM is trained to imitate a larger teacher LM, and (ii) reinforcement learn...
arxiv.org
February 7, 2025 at 7:11 PM
5/ Our suggestions are the following:
- Use online generations during distillation;
- Train on more diverse prompt datasets;
- Expand the dataset with multiple completions per prompt.
February 7, 2025 at 7:11 PM
4/ The results? Teacher hacking is real: better approximating the teacher does not always translate into a better approximation of the oracle. Fortunately, we found some strategies to mitigate it.
February 7, 2025 at 7:11 PM
3/ The key intuition is that distillation optimizes a proxy objective since the teacher isn’t perfect, exactly like RLHF optimizes an imperfect reward. To study this, we built a controlled setup in which an oracle model replaced the ground-truth objective.
February 7, 2025 at 7:11 PM
2/ In our novel work from Google DeepMind, “On Teacher Hacking in Language Model Distillation,” we analyze this possible limitation, which would be critical if real, as distillation is becoming central for the post-training of modern LLMs.
February 7, 2025 at 7:11 PM
Hope I'm not too late 😅
November 21, 2024 at 8:13 PM