Lightnews — Scholar-powered news

Jeremias Sulam

@jsulam.bsky.social

74 followers 140 following 18 posts

Assistant Prof. @ JHU 🇦🇷🇺🇸 Mathematics of Data & Biomedical Data Science
jsulam.github.io

Posts Replies Media Videos

Jeremias Sulam

@jsulam.bsky.social

@cpalconf.bsky.social !

November 19, 2025 at 2:55 PM

Jeremias Sulam

@jsulam.bsky.social

Many more details on derivation, intuition, implementation details and proofs can be found in our paper arxiv.org/pdf/2507.08956 with the amazing Zhenghan Fang, Sam Buchanan and Mateo Diaz @ @jhu.edu @hopkinsdsai.bsky.social

arxiv.org

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

2) In theory, we show that prox-diff sampling requires only O(d/sqrt(eps)) to produce a distribution epsilon-away (in KL) from a target one (assuming oracle prox), faster than the score version (resp. O(d/eps)). Technical assumptions differ in all papers though, so exact comparison is hard (10/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

1) In practice, ProxDM can provide samples from the data distribution much faster than comparable methods based on the score like DDPM from (Ho et al, 2020) and even comparable to their ODE alternatives (which are much faster) (9/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

In this way, we generalize the Proximal Matching Loss from (Fang et al, 2024) to learn time-specific proximal operators for the densities at each discrete time. The result is Proximal Diffusion Models: sampling by using proximal operators instead of the score. This has 2 main advantages: (8/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

So, in order to implement a Proximal/backward version of diffusion models, we need a (cheap!) way of solving this optimization problem, i.e. computing the proximal of the log densities at every single time step. If only there was a way… oh, in come Learned Proximal Networks (7/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

What are proximal operators? You can think of them as generalizations of projection operators. For a given (proximable) functional \rho(x), its proximal is defined by the solution of a simple optimization problem: (6/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

Backward discretization of diff. eqs. has been long studied (c.f. gradient descent vs proximal point method). Let’s go ahead and discretize the same SDE, but backwards! One problem: the update is defined implicitly... But it does admit a close form expression in terms of proximal operators! (5/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

Crucially, this step relies on being able to compute the score function. Luckily, Minimum Mean Squared Estimate (MMSE) denoisers can do just that (at least asymptotically). But, couldn’t there be a different discretization strategy for this SDE, you ask? Great question! Let's go *back*... (4/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

While elegant in continuous time, one needs to discretize the SDE to implement it in practice. In DF, this has always been done through forward discretization (e.g Euler-Maruyama), which combines a gradient step of the data distribution at the discrete time t (the *score*), and Gaussian noise: (3/n)

July 22, 2025 at 7:25 PM

Jeremias Sulam

@jsulam.bsky.social

First, a (very) brief overview of diffusion models (DM). DM work by simulating a process that converts samples from a distribution (random noise) to samples from target distribution of interest. This process is modeled mathematically with a stochastic differential equation (SDE) (2/n)

July 22, 2025 at 7:25 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news