Lightnews — Scholar-powered news

Costa Huang

@vwxyzjn.bsky.social

This is all. Enjoy the new model 😆

May 1, 2025 at 1:21 PM

Costa Huang

@vwxyzjn.bsky.social

One fun thing is that our model outperformed qwen by almost ~26 points in IFEval. What's going on? We built some nice visualization tools, finding out that basically our model can follow the instructions like "write without a comma" well.

May 1, 2025 at 1:21 PM

Costa Huang

@vwxyzjn.bsky.social

Our 1B model achieves impressive performance. See our official tweet for more details!

bsky.app/profile/ai2....

Ai2 @ai2.bsky.social · May 1

We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.

A bar graph comparing average performance (10 Tasks) across OLMo 2 1B, SmolLM2 1.7B, Gemma 3 1B, Llama 3.2 1B, and Qwen 2.5 1.5B. The highest performance is 42.7, achieved by OLMo 2 1B.

May 1, 2025 at 1:21 PM

Costa Huang

@vwxyzjn.bsky.social

The model checkpoints are available in huggingface.co/collections/....

As always, we uploaded all the intermediate RL checkpoints

May 1, 2025 at 1:21 PM

Costa Huang

@vwxyzjn.bsky.social

We streamlined our release process to include the RLVR intermediate checkpoints as well. They are available in the revisions if you want to check it out.

See our updated collection here: huggingface.co/collections/...

March 13, 2025 at 7:19 PM

Costa Huang

@vwxyzjn.bsky.social

Finally, @hamishivi.bsky.social initially added the GRPO async RL script here: github.com/allenai/open.... We are battle-testing it and hope to share more soon!

https://github.com/allenai/open-instruct/blob/main/open_instruct/grpo_vllm_thread_ray_gtrl.py…

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

💾 I included the reproduction commands here:
github.com/allenai/open...

github.com

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

📦Here is the trained model. The main recipe is basically the same, except we used a different RL algorithm, so we are just doing a minor release.

huggingface.co/allenai/Llam...

huggingface.co

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

🗡️ The training length is a confounder, but I did run a launch an ablation study on the same `allenai/RLVR-MATH` dataset, using almost identical hyperparams for PPO and GRPO:

The PPO's MATH score is more consistent with the Llama-3.1-Tulu-3-8B model, but GRPO got higher scores.

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

📈 Below is the training curve. I think part of the performance gain also comes from training RL for longer

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

🎆 @natolambert.bsky.social also updated this figure in our paper, for a better visualization :D

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

🎁 We applied the same RLVR dataset (allenai/RLVR-GSM-MATH-IF-Mixed-Constraints) using our new GRPO training script - the trained model checkpoints are better!

February 12, 2025 at 5:33 PM

Costa Huang

@vwxyzjn.bsky.social

Thanks @soldni.bsky.social for the better OLMoE base model and for pulling everything through,
@ljvmiranda.bsky.social for on-policy preferences, and many others for coordinating and making the release happen 💪

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

For a cleaned-up version, please refer to Tulu 3 repro commands github.com/allenai/open...

github.com

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

For interested folks, I also included the script I used to launch all the SFT / DPO / RLVR experiments here github.com/allenai/open.... Not cleaned up, but hope it shows some traces of the end-to-end workflow.

github.com

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

All of our research artifacts are fully open source and released. Check out our HF collection:

huggingface.co/collections/...

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

We hope you love this new OLMoE model! Download our app at apps.apple.com/us/app/ai2-o...

‎Ai2 OLMoE

‎Our app is intended to help researchers better explore how to make on-device intelligence better and to enable developers to quickly prototype new AI experiences – all with no cloud connectivity requ...

apps.apple.com

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

This is how our new allenai/OLMoE-1B-7B-0125-Instruct models compare with the existing allenai/OLMoE-1B-7B-0924-Instruct checkpoint :)

Huge gains on GSM8K, DROP, MATH, and alpaca eval.

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

We found the RLVR + GSM8K recipe to work robustly, and the scores kept going up

February 11, 2025 at 3:30 PM

Costa Huang

@vwxyzjn.bsky.social

So maybe using it in the loss directly instead of in the rewards changes certain things? I am not sure.

Anyway just thought the snippet is interesting to share.

January 31, 2025 at 3:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news