Lightnews — Scholar-powered news

Rosmine

@rosmineb.bsky.social

110 followers 340 following 48 posts

Senior ML Scientist @ FAANG working on LLMs
DM me ml questions

Posts Replies Media Videos

Rosmine

@rosmineb.bsky.social

Overview of GRPO (Group Relative Policy Optimization)

GRPO is an improvement on PPO introduced in the DeepSeekMath paper

The motivation is that PPO requires 4 large models, a policy, value function, reward model, and reference model. GRPO removes the need for the value model.

January 10, 2025 at 8:06 PM

Rosmine

@rosmineb.bsky.social

I want to train a transformer model to be a random number generator

At first it sounds dumb, but you could leverage GPU non-determinism to make it truly random, not just pseudo random

There are better ways to do rng so I still think it's a bad idea, but a cool bad idea

December 9, 2024 at 9:39 PM

Rosmine

@rosmineb.bsky.social

A Critical Evaluation of AI Feedback for Aligning Large Language Models

Investigated paradigm of modifying model behavior by first doing SFT training using data from teacher model, then following with RLAIF training by teacher reward model

They found:

December 2, 2024 at 5:17 PM

Rosmine

@rosmineb.bsky.social

Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
This paper introduces LaTent Reasoning Optimization (LaTRO), a training framework
- Improves zero shot accuracy by +12.5% on GSM8K over base models.
- Doesn't use external feedback or reward models
...

November 27, 2024 at 4:12 PM

Rosmine

@rosmineb.bsky.social

Trying a new ML research area is tough. Every time I think I have a good new idea, I google and find there are already 50,000 papers covering it.

I cannot read 50,000 papers per day

November 26, 2024 at 8:14 PM

Rosmine

@rosmineb.bsky.social

Training Verifiers to Solve Math Word Problems (2021)
- This paper introduced GSM8K, and showed how using verifiers can significantly improve performance (up to 20+ percentage points compared to finetuning, see graph below)

November 25, 2024 at 7:27 PM

Rosmine

@rosmineb.bsky.social

Untuned SOAP beats tuned adamw at ever single step

Ethan @ethansmith2000.com · Nov 24

ADAM's been tuned but SOAP and PSGD just using default params, you love to see it.

November 25, 2024 at 12:08 AM

Rosmine

@rosmineb.bsky.social

I made a script to help you quickly become an expert in an ML subfield
How works:
1. Input a few papers
2. Input a description of the subfield. Could be generic like "optimizers" or highly specific like "improvements on LoRA"
🧵

November 21, 2024 at 6:20 PM

Rosmine

@rosmineb.bsky.social

So much learning in re-reading old banger papers

e.g. Adafactor includes
- new low rank matrix approximation algorithm (used for second moment)
- detecting when Adam second moment is out of date
- better beta_2 schedules
- analysis of model training stability

arxiv.org/pdf/1804.04235

arxiv.org

November 20, 2024 at 3:30 PM

Rosmine

@rosmineb.bsky.social

I recently spent set up ssh to my computer from anywhere, here are some tips:

November 8, 2024 at 6:31 PM

Rosmine

@rosmineb.bsky.social

Is it possible to tell if a model was trained on test? I thought it would be easy to tell if a model was trained on test by using Membership Inference Attacks
I looked into it more and turns out it's a lot harder than I thought because...

November 7, 2024 at 6:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news