If I seem very angry, check if I have been watered in the last 24 hours.
Now 🇺🇸 flavoured, previously available in 🇨🇦 and 🇩🇪
@mightofmerchants.bsky.social looks like an incredible tool
@mightofmerchants.bsky.social looks like an incredible tool
Some tuning is still needed, but we are seeing results roughly on par with #PQN.
If you want to test out #REPPO (atari is not integrated due to issues with envpool and jax version), check out github.com/cvoelcker/re...
#reinforcementlearning
Some tuning is still needed, but we are seeing results roughly on par with #PQN.
If you want to test out #REPPO (atari is not integrated due to issues with envpool and jax version), check out github.com/cvoelcker/re...
#reinforcementlearning
Textured steel pans are super magical and you should really get one! No sticking at all, and I made egg, pancake, and stir fry so far.
Bonus points: really funny pattern
Textured steel pans are super magical and you should really get one! No sticking at all, and I made egg, pancake, and stir fry so far.
Bonus points: really funny pattern
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019
The good news: Apparently continual learning issues are solved and it continuous learning exactly as well as before
I've never seen catastrophic forgetting so clean before.
#rl
The good news: Apparently continual learning issues are solved and it continuous learning exactly as well as before
I've never seen catastrophic forgetting so clean before.
#rl
For more details, come chat with us in #Singapore 😎
For more details, come chat with us in #Singapore 😎
Prepping to be the “fun uncle” 😁
Prepping to be the “fun uncle” 😁