🌐 https://abhishekshar.com/
We show that a simple approach to learn safe RL policies can outperform most offline RL methods. (+theoretical guarantees!)
How? Just allow the state-actions that have been seen enough times! 🤯
arxiv.org/abs/2410.09361
We show that a simple approach to learn safe RL policies can outperform most offline RL methods. (+theoretical guarantees!)
How? Just allow the state-actions that have been seen enough times! 🤯
arxiv.org/abs/2410.09361