Maëva L'Hôtellier
maevalhotellier.bsky.social
Maëva L'Hôtellier
@maevalhotellier.bsky.social
Studying learning and decision-making in humans | HRL team - ENS Ulm |
Reposted by Maëva L'Hôtellier
@magdalenasabat.bsky.social used 🔌 ephys to show that ferret auditory cortex neurons integrate sounds within fixed windows (~15–150 ms) that increase in non-primary auditory cortex, independent of information rate.
▶️ www.biorxiv.org/content/10.1...
#Neuroscience
Neurons in auditory cortex integrate information within constrained temporal windows that are invariant to the stimulus context and information rate
Much remains unknown about the computations that allow animals to flexibly integrate across multiple timescales in natural sounds. One key question is whether multiscale integration is accomplished by...
www.biorxiv.org
February 17, 2025 at 1:39 PM
Link to the preprint:
osf.io/preprints/ps...
OSF
osf.io
December 10, 2024 at 6:14 PM
Questions or thoughts? Let’s discuss!
Reach out — we’d love to hear from you! 🙌
December 10, 2024 at 6:02 PM
Why does it matter? 🤔
Our work aims at bridging cognitive science and machine learning, showing how human-inspired principles like reward normalization can improve reinforcement learning AI systems!
December 10, 2024 at 6:02 PM
What about Deep Decision Trees? 🌳
We further extend the RA model by integrating a temporal difference component to the dynamic range updates. With this extension, we demonstrate that the magnitude invariance capabilities of the RA model persist in multi-step tasks.
December 10, 2024 at 6:02 PM
With this enhanced model, we generalize the main findings to other bandit settings: The dynamic RA model outperforms the ABS model in several bandit tasks with noisy outcomes, non-stationary rewards, and even multiple options.
December 10, 2024 at 6:02 PM
Once these basic properties are demonstrated in a simplified set-up, we enhance the RA model to successfully cope with stochastic and volatile environments, by dynamically adjusting its internal range variables (Rmax / Rmin).
December 10, 2024 at 6:02 PM
In contrast, the RA model, by constraining all rewards to a similar scale, efficiently balances exploration and exploitation without the need for task-specific adjustment!
December 10, 2024 at 6:02 PM
Crucially, modifying the value of the temperature (𝛽) from the Softmax function does not solve the problem of the standard model. It simply shifts the peak performance along the magnitude axis.
Thus, to achieve high performance, the ABS model requires tuning the 𝛽 value to the magnitudes at stake.
December 10, 2024 at 6:02 PM
Agent-Level Insights: ABS performance drops to chance due to over-exploration in small rewards and over-exploitation in large rewards.
In contrast, the RA model maintains a consistent, scale-invariant performance.
December 10, 2024 at 6:02 PM
First, we simulate ABS and RA behavior in bandits tasks with various magnitude and discriminability levels.

As expected the standard model is highly dependent on the tasks levels, while the RA model achieves high accuracy over the whole range of values tested!
December 10, 2024 at 6:02 PM
To avoid magnitude-dependence, we propose the Range-Adapted (RA) model: RA normalizes rewards, enabling consistent representation of subjective values within a constrained space, independent of reward magnitude.
December 10, 2024 at 6:02 PM
Standard reinforcement learning algorithms encode rewards in an unbiased, absolute manner (ABS), which make their performance magnitude-dependent.
December 10, 2024 at 6:02 PM
This work was done in collaboration with Jérémy Pérez, under the supervision of @stepalminteri.bsky.social 👥

Let's now dive into the study!
December 10, 2024 at 6:02 PM