wellcomeleap.org/ua/program/
wellcomeleap.org/ua/program/
2/2
2/2
1/2
1/2
We’d love feedback, extensions, or critiques.
@neuralreckoning.bsky.social @fzenke.bsky.social @wellingmax.bsky.social
#NeuroAI
6/6
We’d love feedback, extensions, or critiques.
@neuralreckoning.bsky.social @fzenke.bsky.social @wellingmax.bsky.social
#NeuroAI
6/6
Because it's not just a distance, it's the cost of believing your own model too much.
Minimizing KL = reducing surprise = optimizing variational free energy.
A silent principle behind robust inference.
5/6
Because it's not just a distance, it's the cost of believing your own model too much.
Minimizing KL = reducing surprise = optimizing variational free energy.
A silent principle behind robust inference.
5/6
- A family of importance-weighted straight-through estimators (IW-ST), which unify and generalize previous methods.
- No need for backprop-through-noise tricks.
- No batch norm.
Just clean, effective training.
4/6
- A family of importance-weighted straight-through estimators (IW-ST), which unify and generalize previous methods.
- No need for backprop-through-noise tricks.
- No batch norm.
Just clean, effective training.
4/6
This lets us derive a principled loss from first principles—grounded in variational free energy, not heuristics.
3/6
This lets us derive a principled loss from first principles—grounded in variational free energy, not heuristics.
3/6
Why? Discrete activations → non-differentiable.
Most current methods either approximate gradients or add noisy surrogates.
We do something different.
2/6
Why? Discrete activations → non-differentiable.
Most current methods either approximate gradients or add noisy surrogates.
We do something different.
2/6