Lightnews — Scholar-powered news

Amirhossein Kazemnejad

@a-kazemnejad.bsky.social

13 followers 8 following 4 posts

Working on RL training of LLMs @Mila_Quebec.

Posts Replies Media Videos

Amirhossein Kazemnejad

@a-kazemnejad.bsky.social

Done with my co-author Milad Aghajohari

April 4, 2025 at 7:58 PM

Amirhossein Kazemnejad

@a-kazemnejad.bsky.social

Some example outputs:

This is "Qwen2.5 3B-base" model trained for 1000 RL steps only on CountDown task with correctness reward.

Checkpoint at huggingface.co/McGill-NLP/n...

April 4, 2025 at 7:58 PM

Amirhossein Kazemnejad

@a-kazemnejad.bsky.social

Github Repo:
github.com/McGill-NLP/n...

YouTube Video:
www.youtube.com/playlist?lis...

and yes, we recreated DeepSeek R1-Zero style-training on CountDown in ~10h with one A100.

April 4, 2025 at 7:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news