Lightnews — Scholar-powered news

Claas Voelcker

@cvoelcker.bsky.social

2.6K followers 400 following 610 posts

For professional, see https://cvoelcker.de

If I seem very angry, check if I have been watered in the last 24 hours.

Now 🇺🇸 flavoured, previously available in 🇨🇦 and 🇩🇪

Posts Replies Media Videos

Claas Voelcker

@cvoelcker.bsky.social

"For you" finally knows what I want! More #dndmaps

@mightofmerchants.bsky.social looks like an incredible tool

November 25, 2025 at 7:45 PM

Claas Voelcker

@cvoelcker.bsky.social

It is as if thousands of researchers suddenly cried out in terror and were suddenly silenced

November 11, 2025 at 9:03 PM

Claas Voelcker

@cvoelcker.bsky.social

Even though I grieve leaving Toronto, I have to begrudgingly admit that the University of Texas at Austin is pretty gorgeous 😁

Trees casting strong shadows on a brickwork university building at UT Austin

November 4, 2025 at 9:49 PM

Claas Voelcker

@cvoelcker.bsky.social

I have been told I need to get more modern in my paper promotion! github.com/cvoelcker/reppo / arxiv.org/abs/2507.11019 @marcelhussing.bsky.social

Happy guy sad guy meme with sad text: USE PPO AND TUNE HYPERPARAMETER FOR WEEKS and happy text: USE REPPO AND GET A POLICY

September 26, 2025 at 2:51 PM

Claas Voelcker

@cvoelcker.bsky.social

Big if true 🤫: #REPPO works on Atari as well 😱 👾 🚀

Some tuning is still needed, but we are seeing results roughly on par with #PQN.

If you want to test out #REPPO (atari is not integrated due to issues with envpool and jax version), check out github.com/cvoelcker/re...

#reinforcementlearning

A lonely return curve on the ALE game Qbert-v5 for the REPPO algorithm

September 16, 2025 at 1:29 PM

Claas Voelcker

@cvoelcker.bsky.social

Time to go on a random posting spree:

Textured steel pans are super magical and you should really get one! No sticking at all, and I made egg, pancake, and stir fry so far.

Bonus points: really funny pattern

September 14, 2025 at 7:12 PM

Claas Voelcker

@cvoelcker.bsky.social

Huge shout-out to @sologen.bsky.social and @igilitschenski.bsky.social for putting up with me, my relentless skepticism, and hand-wavy ideas for so many years! Thanks to Wil Cunningham, Florian Shkurti, and Philip Thomas for letting me get away with my thesis 😁

July 26, 2025 at 3:00 PM

Claas Voelcker

@cvoelcker.bsky.social

🔥 Presenting Relative Entropy Pathwise Policy Optimization #REPPO 🔥
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

July 17, 2025 at 7:11 PM

Claas Voelcker

@cvoelcker.bsky.social

July 8, 2025 at 8:32 PM

Claas Voelcker

@cvoelcker.bsky.social

Just... why???

June 26, 2025 at 11:03 PM

Claas Voelcker

@cvoelcker.bsky.social

I'm happy to announce new SOTA on the brax walker2d environment :D I guess this is the clipping bug policy?

Return curves of an RL algorithm on a log scale. One orange curve rises well above 10000000, breaking any reasonable reward limit.

June 6, 2025 at 4:07 PM

Claas Voelcker

@cvoelcker.bsky.social

So, eh, what???

May 22, 2025 at 5:21 PM

Claas Voelcker

@cvoelcker.bsky.social

I’m so happy I finally got invited to a formal event again so I can go the extra mile in dress-up 🎩

May 11, 2025 at 4:30 PM

Claas Voelcker

@cvoelcker.bsky.social

The bad news: my new algorithm forgets everything for some reason in the middle of training
The good news: Apparently continual learning issues are solved and it continuous learning exactly as well as before

I've never seen catastrophic forgetting so clean before.

#rl

Return curve of an algorithm that goes up (with some spikes), then drops to 0 and starts learning again cleanly from scratch.

May 9, 2025 at 1:13 PM

Claas Voelcker

@cvoelcker.bsky.social

God has heard my plea!

May 1, 2025 at 12:23 PM

Claas Voelcker

@cvoelcker.bsky.social

RIP cat 😭 you were a constant source of joy and back pain! I hope cat heaven never gets mad at you for demanding new food after you only ate half a bowl.

Our black and white cat, sitting awkwardly on my back while I work on the couch, lying in my belly.

April 28, 2025 at 9:12 PM

Claas Voelcker

@cvoelcker.bsky.social

However, we can prevent this by generating a small amount of on-policy trajectories from a learned #worldmodel. This leads to remarkably stable training across the most challenging DMC tasks!

For more details, come chat with us in #Singapore 😎

RL reward curves for a hard task in the DMC suite

February 11, 2025 at 10:14 PM

Claas Voelcker

@cvoelcker.bsky.social

Getting the most out of limited interactions is a fundamental challenge in off-policy reinforcement learning. But when you try to run modern methods like SAC, they diverge as soon as you increase the number of learning steps … because they rely on hallucinated on-policy values.

A diagram highlighting how rl algorithm query old states under new policies.

February 11, 2025 at 10:14 PM

Claas Voelcker

@cvoelcker.bsky.social

In a desperate attempt to share some alternative German culture today 😬 may I introduce you to the fact that my beautiful mother tongue refers to the beloved, but somewhat sterile named “raccoon” 🦝 as a “Waschbär”, “washing bear”, which is just a hell of a lot cuter.
Prepping to be the “fun uncle” 😁