Reinforcement Learning enjoyer, sometimes even with human feedback
Ex. student-researcher at Google DeepMind Paris
🌐 https://d-tiapkin.github.io/
- Use online generations during distillation;
- Train on more diverse prompt datasets;
- Expand the dataset with multiple completions per prompt.
- Use online generations during distillation;
- Train on more diverse prompt datasets;
- Expand the dataset with multiple completions per prompt.