Lightnews — Scholar-powered news

Rosmine

@rosmineb.bsky.social

110 followers 340 following 48 posts

Senior ML Scientist @ FAANG working on LLMs
DM me ml questions

Posts Replies Media Videos

Rosmine

@rosmineb.bsky.social

Overview of GRPO (Group Relative Policy Optimization)

GRPO is an improvement on PPO introduced in the DeepSeekMath paper

The motivation is that PPO requires 4 large models, a policy, value function, reward model, and reference model. GRPO removes the need for the value model.

January 10, 2025 at 8:06 PM

Rosmine

@rosmineb.bsky.social

- Performance improvement from RLAIF vs. SFT depends on the base model. E.g. For Llama models, SFT is much more effective than RLAIF (see graph)

December 2, 2024 at 5:18 PM

Rosmine

@rosmineb.bsky.social

All I did was post paper summaries, I guess people don’t like my taste in papers

November 28, 2024 at 7:04 PM

Rosmine

@rosmineb.bsky.social

I'm suspicious he's real, the pinned tweet makes me think it's a parody (but mostly I'm suspicious because he followed me lol)

November 27, 2024 at 2:06 PM

Rosmine

@rosmineb.bsky.social

Training Verifiers to Solve Math Word Problems (2021)
- This paper introduced GSM8K, and showed how using verifiers can significantly improve performance (up to 20+ percentage points compared to finetuning, see graph below)

November 25, 2024 at 7:27 PM

Rosmine

@rosmineb.bsky.social

I made a project to play dance dance revolution without a dance pad, instead using 2 high speed cameras, and running each frame through a shallow convnet to classify the steps.

The tape on the floor is so I know where to step. The stools are so I don't accidentally step on cameras

November 23, 2024 at 6:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news