All posts are my own.
📖 Replicable Reinforcement Learning with Linear Function Approximation
🔗 arxiv.org/abs/2509.08660
In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)
📖 Replicable Reinforcement Learning with Linear Function Approximation
🔗 arxiv.org/abs/2509.08660
In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)
📖 Replicable Reinforcement Learning with Linear Function Approximation
🔗 arxiv.org/abs/2509.08660
In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)
📖 Replicable Reinforcement Learning with Linear Function Approximation
🔗 arxiv.org/abs/2509.08660
In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)
* Replicable Reinforcement Learning with Linear Function Approximation
* Relative Entropy Pathwise Policy Optimization
We already posted about the 2nd one (below), I'll get to talking about the first one in a bit here.
* Replicable Reinforcement Learning with Linear Function Approximation
* Relative Entropy Pathwise Policy Optimization
We already posted about the 2nd one (below), I'll get to talking about the first one in a bit here.
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019
04/29: Max Simchowitz (CMU)
05/06: Jeongyeol Kwon (Univ. of Widsconsin-Madison)
05/20: Sikata Sengupta & Marcel Hussing (Univ. of Pennsylvania)
05/27: Dhruv Rohatgi (MIT)
06/03: David Janz (Univ. of Oxford)
06/10: Nneka Okolo (MIT)
Two by my team at the Adaptive Agents Lab (Adage) together with collaborators:
A Truncated Newton Method for Optimal Transport
openreview.net/forum?id=gWr...
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
openreview.net/forum?id=6Rt...
#ICLR2025
Day 1 👇 #ICLR2025
mila.quebec/en/news/foll...
Two by my team at the Adaptive Agents Lab (Adage) together with collaborators:
A Truncated Newton Method for Optimal Transport
openreview.net/forum?id=gWr...
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
openreview.net/forum?id=6Rt...
#ICLR2025
Good news! We’re extending the #CoLLAs2025 submission deadlines:
📝 Abstracts: Feb 26, 2025, 23:59 AoE
📄 Papers: Mar 3, 2025, 23:59 AoE
More time to refine your work—don't miss this chance to contribute to #lifelong-learning research! 🚀
🔗 lifelong-ml.cc
Good news! We’re extending the #CoLLAs2025 submission deadlines:
📝 Abstracts: Feb 26, 2025, 23:59 AoE
📄 Papers: Mar 3, 2025, 23:59 AoE
More time to refine your work—don't miss this chance to contribute to #lifelong-learning research! 🚀
🔗 lifelong-ml.cc
Well, here is a small "sky thread" (written on a ✈️) about something I recently discovered: e-values!
They are an alternative to the standard p-values as a measure of statistical significance. 1/N
Well, here is a small "sky thread" (written on a ✈️) about something I recently discovered: e-values!
They are an alternative to the standard p-values as a measure of statistical significance. 1/N
Excited to share that out MAD-TD paper got a spotlight at #ICLR25! Check out Claas' thread on how to get the most out of your compute/data buck when training from scratch.
Excited to share that out MAD-TD paper got a spotlight at #ICLR25! Check out Claas' thread on how to get the most out of your compute/data buck when training from scratch.
Oh, and here is this interesting and hard open problem that someone should solve.
Future work sections in empirical ML papers:
We leave hyperparameter optimization for future work.
Oh, and here is this interesting and hard open problem that someone should solve.
Future work sections in empirical ML papers:
We leave hyperparameter optimization for future work.