Mattie Fellows
mattieml.bsky.social
Mattie Fellows
@mattieml.bsky.social
Reinforcement Learning Postdoc at FLAIR, University of Oxford @universityofoxford.bsky.social

All opinions are my own.
Reposted by Mattie Fellows
PQN, a recently introduced value-based method (bsky.app/profile/matt...) has a similar data-collection as PPO. Although we see a similar trend as with PPO, but much less pronounced. It is possible our findings are more correlated with policy-based methods.
9/
June 5, 2025 at 2:31 PM
2/2 🚀 Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! 👇
arxiv.org
May 30, 2025 at 8:39 AM
2/2 🚀 Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! 👇
arxiv.org
May 30, 2025 at 8:37 AM
TeXstudio - A LaTeX editor
www.texstudio.org
May 14, 2025 at 9:34 AM
The techniques used by our work and Bhandari are a standard technique in the analysis of stochastic approximation algorithms and have been around for a long time. Moreover the point of a blog was an expositional tool that acts as a complete analysis of TD. But sure, I'll add even more references...
March 21, 2025 at 10:19 AM
In our paper we quite clearly state at several points including ` convergence of TD methods has been studied extensively (Watkins
& Dayan, 1992; Tsitsiklis & Van Roy, 1997; Dalal et al., 2017; Bhandari et al., 2018; Srikant &
Ying, 2019)' ` our proof is similar to Bhandari et al. (2018).'
March 21, 2025 at 10:12 AM
Crucially, techniques that study linear function approximation could not be used to understand things like LayerNorm
March 21, 2025 at 9:13 AM
As far as I'm aware, and please correct me if I'm wrong, I've never seen the derivation of the path mean Jacobian, which really is a key contribution of our analysis as it allows us to study nonlinear systems (i.e. ACTUAL neural nets used in practice) that many papers like Bhandari etc. can't.
March 21, 2025 at 9:12 AM
we cite said papers several times in our work and the blogs...
March 21, 2025 at 9:06 AM
There are so many great places in the world, if anything it would be a positive to regularly see more conferences in countries other than US/Austria/Canada
March 20, 2025 at 9:47 AM
On it
November 15, 2024 at 7:31 AM
Feel free to add me!
November 15, 2024 at 7:28 AM