Rosmine
banner
rosmineb.bsky.social
Rosmine
@rosmineb.bsky.social
Senior ML Scientist @ FAANG working on LLMs
DM me ml questions
Overview of GRPO (Group Relative Policy Optimization)

GRPO is an improvement on PPO introduced in the DeepSeekMath paper

The motivation is that PPO requires 4 large models, a policy, value function, reward model, and reference model. GRPO removes the need for the value model.
January 10, 2025 at 8:06 PM
I want to train a transformer model to be a random number generator

At first it sounds dumb, but you could leverage GPU non-determinism to make it truly random, not just pseudo random

There are better ways to do rng so I still think it's a bad idea, but a cool bad idea
December 9, 2024 at 9:39 PM
A Critical Evaluation of AI Feedback for Aligning Large Language Models

Investigated paradigm of modifying model behavior by first doing SFT training using data from teacher model, then following with RLAIF training by teacher reward model

They found:
December 2, 2024 at 5:17 PM
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
This paper introduces LaTent Reasoning Optimization (LaTRO), a training framework
- Improves zero shot accuracy by +12.5% on GSM8K over base models.
- Doesn't use external feedback or reward models
...
November 27, 2024 at 4:12 PM
Trying a new ML research area is tough. Every time I think I have a good new idea, I google and find there are already 50,000 papers covering it.

I cannot read 50,000 papers per day
November 26, 2024 at 8:14 PM
Training Verifiers to Solve Math Word Problems (2021)
- This paper introduced GSM8K, and showed how using verifiers can significantly improve performance (up to 20+ percentage points compared to finetuning, see graph below)
November 25, 2024 at 7:27 PM
Untuned SOAP beats tuned adamw at ever single step
ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.
November 25, 2024 at 12:08 AM
I made a script to help you quickly become an expert in an ML subfield
How works:
1. Input a few papers
2. Input a description of the subfield. Could be generic like "optimizers" or highly specific like "improvements on LoRA"
🧵
November 21, 2024 at 6:20 PM
So much learning in re-reading old banger papers

e.g. Adafactor includes
- new low rank matrix approximation algorithm (used for second moment)
- detecting when Adam second moment is out of date
- better beta_2 schedules
- analysis of model training stability

arxiv.org/pdf/1804.04235
arxiv.org
November 20, 2024 at 3:30 PM
I recently spent set up ssh to my computer from anywhere, here are some tips:
November 8, 2024 at 6:31 PM
Is it possible to tell if a model was trained on test? I thought it would be easy to tell if a model was trained on test by using Membership Inference Attacks
I looked into it more and turns out it's a lot harder than I thought because...
November 7, 2024 at 6:36 PM