Fixating on multi-agent RL, Neuro-AI and decisions
Ēka ē-akimiht
https://danemalenfant.com/
Policy gradient agent's performance suffers with more agents but self correction still stabilizes learning arxiv.org/abs/2505.20579
Policy gradient agent's performance suffers with more agents but self correction still stabilizes learning arxiv.org/abs/2505.20579
To communicate this to a general audience and the #art community, I built a minimal task: two Gaussian bandits. One agent optimizes with entropy; the other doesn’t. Mid-training, the reward distribution jumps.
To communicate this to a general audience and the #art community, I built a minimal task: two Gaussian bandits. One agent optimizes with entropy; the other doesn’t. Mid-training, the reward distribution jumps.
I proposed a reinforcement-learning (RL) demo: add a maximum-entropy term to increase the longevity of systems in a non-stationary environment. This is well known to the RL research community: openreview.net/forum?id=PtS...
(photo by Félix Bonne-Vie)
I proposed a reinforcement-learning (RL) demo: add a maximum-entropy term to increase the longevity of systems in a non-stationary environment. This is well known to the RL research community: openreview.net/forum?id=PtS...
(photo by Félix Bonne-Vie)
Rather than "if x then y" this tested "if not x then not y".
This inhibits learning the sub-policy for maximizing collective reward. Agents compete even with a larger reward signal not to
Rather than "if x then y" this tested "if not x then not y".
This inhibits learning the sub-policy for maximizing collective reward. Agents compete even with a larger reward signal not to