#machinelearning #deeplearning #probability #statistics #optimization #sampling
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686
When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.
When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.
arxiv.org/abs/2502.07265
Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels
arxiv.org/abs/2502.07265
Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels
arxiv.org/abs/2409.08469
Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)
arxiv.org/abs/2409.08469
Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)
arxiv.org/abs/2412.17181
We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.
arxiv.org/abs/2412.17181
We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.
OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.
youtu.be/4QG_MGEBQow?...
OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.
youtu.be/4QG_MGEBQow?...
cc:
@yisongyue.bsky.social
cc:
@yisongyue.bsky.social
The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.
The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.