Amirhossein Kazemnejad
banner
a-kazemnejad.bsky.social
Amirhossein Kazemnejad
@a-kazemnejad.bsky.social
Working on RL training of LLMs @Mila_Quebec.
Done with my co-author Milad Aghajohari
April 4, 2025 at 7:58 PM
Some example outputs:

This is "Qwen2.5 3B-base" model trained for 1000 RL steps only on CountDown task with correctness reward.

Checkpoint at huggingface.co/McGill-NLP/n...
April 4, 2025 at 7:58 PM
Github Repo:
github.com/McGill-NLP/n...

YouTube Video:
www.youtube.com/playlist?lis...

and yes, we recreated DeepSeek R1-Zero style-training on CountDown in ~10h with one A100.
April 4, 2025 at 7:58 PM