bsky.app/profile/ai2....
bsky.app/profile/ai2....
As always, we uploaded all the intermediate RL checkpoints
As always, we uploaded all the intermediate RL checkpoints
See our updated collection here: huggingface.co/collections/...
See our updated collection here: huggingface.co/collections/...
github.com/allenai/open...
github.com/allenai/open...
huggingface.co/allenai/Llam...
huggingface.co/allenai/Llam...
The PPO's MATH score is more consistent with the Llama-3.1-Tulu-3-8B model, but GRPO got higher scores.
The PPO's MATH score is more consistent with the Llama-3.1-Tulu-3-8B model, but GRPO got higher scores.
@ljvmiranda.bsky.social for on-policy preferences, and many others for coordinating and making the release happen 💪
@ljvmiranda.bsky.social for on-policy preferences, and many others for coordinating and making the release happen 💪
huggingface.co/collections/...
huggingface.co/collections/...
Huge gains on GSM8K, DROP, MATH, and alpaca eval.
Huge gains on GSM8K, DROP, MATH, and alpaca eval.
Anyway just thought the snippet is interesting to share.
Anyway just thought the snippet is interesting to share.