Florentin Guth
florentinguth.bsky.social
Florentin Guth
@florentinguth.bsky.social
Postdoc at NYU CDS and Flatiron CCN. Wants to understand why deep learning works.
Reposted by Florentin Guth
Next appointment: 31st July 2025 – 16:00 CEST on Zoom with 🔵Keynote: @pseudomanifold.topology.rocks (University of Fribourg) 🔴 @florentinguth.bsky.social (NYU & Flatiron)
July 10, 2025 at 8:47 AM
What I meant is that there are generalizations of the CLT to infinite variance. The limit is then an alpha stable distribution (includes Gaussian, Cauchy, but not Gumbel). Also, even if x is heavy tailed then log p(x) is typically not. So a product of Cauchy distributions has a Gaussian log p(x)!
June 8, 2025 at 6:39 PM
At the same time, there are simple distributions that have Gumbel-distributed log probabilities. The simplest example I could find is a Gaussian scale mixture where the variance is distributed like an exponential variable. So it is not clear if we will be able to say something more about this! 2/2
June 8, 2025 at 5:00 PM
If you have independent components, even if heavy-tailed, then log p(x) is a sum of iid variables and is thus distributed according to a (sum) stable law. A conjecture is that the minimum comes from a logsumexp, so a mixture distribution (sum of p) rather than a product (sum of log p). 1/2
June 8, 2025 at 5:00 PM
For a more in-depth discussion of the approach and results (and more!): arxiv.org/pdf/2506.05310
arxiv.org
June 6, 2025 at 10:11 PM
Finally, we test the manifold hypothesis: what is the local dimensionality around an image? We find that this depends both on the image and the size of the local neighborhood, and there exists images with both large full-dimensional and small low-dimensional neighborhoods.
June 6, 2025 at 10:11 PM
High probability ≠ typicality: very high-probability images are rare. This is not a contradiction: frequency = probability density *multiplied by volume*, and volume is weird in high dimensions! Also, the log probabilities are Gumbel-distributed, and we don't know why!
June 6, 2025 at 10:11 PM
These are the highest and lowest probability images in ImageNet64. An interpretation is that -log2 p(x) is the size in bits of the optimal compression of x: higher probability images are more compressible. Also, the probability ratio between these is 10^14,000! 🤯
June 6, 2025 at 10:11 PM
But how do we know our probability model is accurate on real data?
In addition to computing cross-entropy/NLL, we show *strong* generalization: models trained on *disjoint* subsets of the data predict the *same* probabilities if the training set is large enough!
June 6, 2025 at 10:11 PM
We call this approach "dual score matching". The time derivative constrains the learned energy to satisfy the diffusion equation, which enables recovery of accurate and *normalized* log probability values, even in high-dimensional multimodal distributions.
June 6, 2025 at 10:11 PM
We also propose a simple procedure to obtain good network architectures for the energy U: choose any pre-existing score network s and simply take the inner product with the input image y! We show that this preserves the inductive biases of the base score network: grad_y U ≈ s.
June 6, 2025 at 10:11 PM
How do we train an energy model?
Inspired by diffusion models, we learn the energy of both clean and noisy images along a diffusion. It is optimized via a sum of two score matching objectives, which constrain its derivatives with both the image (space) and the noise level (time).
June 6, 2025 at 10:11 PM
This also manifests in what operator space and norm you're considering. Here you have bounded operators with operator norm or trace-class operators with nuclear norm. This matters a lot in infinite dimensions but also in finite but large dimensions!
April 10, 2025 at 9:32 PM
Absolutely! Their behavior is quite different (e.g., consistency of eigenvalues and eigenvectors in the proportional asymptotic regime). You also want to use different objects to describe them: eigenvalues should be thought either as a non-increasing sequence or as samples from a distribution.
April 10, 2025 at 9:29 PM
Some more random conversation topics:
- what we should do to improve/replace these huge conferences
- replica method and other statphys-inspired high-dim probability (finally trying to understand what the fuss is about)
- textbooks that have been foundational/transformative for your work
December 9, 2024 at 1:02 AM