Lightnews — Scholar-powered news

Glen Berseth

@glenberseth.bsky.social

1.9K followers 82 following 180 posts

Assistant Prof at @UMontreal @mila-quebec.bsky.social @MontrealRobots . CIFAR AI Chair, RL_Conference chair. Creating generalist problem-solving agents for the real world. He/him/il.

Posts Media Videos Starter Packs

Pinned

Glen Berseth @glenberseth.bsky.social · Oct 8

Creating Generalist robotics policies (GRPs) is tricky. In this video (and code) I share how to create a GRP from scratch from some basic transformer code. This is the first step in my plan to create a course on large models and scaling for RL and Robotics.

1 7

Glen Berseth @glenberseth.bsky.social · 1d

For those interested in joining my lab, submit your application via the Mila form. This year I am particularly interested in students with skills/interests in robotics, reinforcement learning and, foundational models which will push forward the abilities of real world agents.

Glen Berseth @glenberseth.bsky.social · 5d

Bienvenue!

Glen Berseth @glenberseth.bsky.social · 6d

I am at #COLM2025 today to talk about AI, LLMs and simulation in the social simulation workshop. Come find me, happy to chat about all things AI, embodiment, and simulation.

Glen Berseth @glenberseth.bsky.social · 14d

The same could be said for science.

Eugene Vinitsky 🍒 @eugenevinitsky.bsky.social · Sep 8

The single biggest epistemic challenge in the internet era is remaining calibrated about what "normal" people think while the internet throws up an infinite wall of crazy. Thousands of people sharing an absurd opinion on the internet tells you very little!

Glen Berseth @glenberseth.bsky.social · 14d

Nice work!

Glen Berseth @glenberseth.bsky.social · 15d

There are many ways to learn or compute a critic that can help score the performance of different actions. This is not the full story. If you want more details, go read rlhfbook.com/c/11-policy-...

Glen Berseth @glenberseth.bsky.social · 15d

GRPO is more like REINFORCE than PPO.
1) It does not train a critic (no need with small variance)
2) The SCORE FUNCTION (difficult to call this an advantage) is over a batch using the same initial prompt (similar to the vine sample method from TRPO)

1 1

Glen Berseth @glenberseth.bsky.social · 22d

On my way to South Korea for a week packed with robotics at the conference on Robot Learning, Humanoids2025, and the global forum on mechanical engineering.

Glen Berseth @glenberseth.bsky.social · 23d

Very exciting! Congratulations.

Glen Berseth @glenberseth.bsky.social · 24d

Make a plan for the next 2-3 months
1. Have clear goals/claims
2. Have a clear way to measure progress
3. Share the plan and get hashtag#consensus with your collaborators.

Without a plan, how does one know they are making progress 🤔

Glen Berseth @glenberseth.bsky.social · 24d

One of the most common logical fallacies I see is "GPUs are cooking, therefore progress." I see people with 1/10th the compute get 10x more progress because... they have a more thorough plan. #Moretimethinkinglesstimeburning

1 4

Glen Berseth @glenberseth.bsky.social · 24d

Because you do so much awesome stuff!

Glen Berseth @glenberseth.bsky.social · Sep 8

I suggest going out and talking to real people. They provide a much richer signal.

Eugene Vinitsky 🍒 @eugenevinitsky.bsky.social · Sep 8

1 6

Glen Berseth @glenberseth.bsky.social · Sep 8

Maybe one of the best use cases.

1 2

Glen Berseth @glenberseth.bsky.social · Aug 23

Great work with Cyrus Neary, Omar G. Younis, Artur Kuramshin, and
Özgür Aslan.
Check out the paper for more details!

Paper: arxiv.org/abs/2508.12211
Code: github.com/cyrusneary/vlaps

Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search

Pre-trained vision-language-action (VLA) models offer a promising foundation for generalist robot policies, but often produce brittle behaviours or unsafe failures when deployed zero-shot in out-of-di...

arxiv.org

Glen Berseth @glenberseth.bsky.social · Aug 23

We compare different checkpoints during the training process.
Vision-Language-Action Planning and Search (VLAPS) significantly outperforms VLA-only baselines on simulated, language-specified robotic tasks, improving success rates by up to 67 percentage points.

1 1

Glen Berseth @glenberseth.bsky.social · Aug 23

VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.

1 8

Glen Berseth @glenberseth.bsky.social · Aug 21

Efficiency may be the most important. If we can't make these tools economical, they will not last.

Jeff Dean @jeffdean.bsky.social · Aug 21

AI efficiency is important. The median Gemini Apps text prompt in May 2025 used 0.24 Wh of energy (<9 seconds of TV watching) & 0.26 mL (~5 drops) of water. Over 12 months, we reduced the energy footprint of a median text prompt 33x, while improving quality:
cloud.google.com/blog/product...

1 3

Glen Berseth @glenberseth.bsky.social · Aug 20

Happy to team up!

Glen Berseth @glenberseth.bsky.social · Aug 20

My lab at @montrealrobotics.bsky.social was honoured to present our recent work to @mark-carney.bsky.social and Even Solomon explaining how AI enables new robotics that will drive innovation in Canada. It was a pleasure getting into the details with a quick dive into deterministic policy gradients!

2 13

Glen Berseth @glenberseth.bsky.social · Aug 17

Another fantastic Montreal Robotics Summer School! Thanks to our sponsors, organizers, and @mila-quebec.bsky.social, we doubled in size this year. Congratulations again to all the students who make this school happen, and for your progress in machine learning and robotics.

1 8

Glen Berseth @glenberseth.bsky.social · Aug 8

The team is already growing

Glen Berseth @glenberseth.bsky.social · Aug 7

@rl-conference.bsky.social will be Montréal next year @umontreal-en.bsky.social!

1 2 18

Glen Berseth @glenberseth.bsky.social · Aug 5

For even more details check out the paper draft.
arxiv.org/abs/2508.01329
For all the details come talk to me at @rl-conference.bsky.social finding the frame workshop tomorrow.

Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

In the era of deep reinforcement learning, making progress is more complex, as the collected experience must be compressed into a deep model for future exploitation and sampling. Many papers have show...

arxiv.org

Glen Berseth @glenberseth.bsky.social · Aug 5

For more details checkout this blog post.
fracturedplane.notion.site/How-well-do-...

How well do RL Algorithms use Their Experience in this Era of Experience? | Notion

by Glen Berseth

fracturedplane.notion.site

1 4