Foundations of Diffusion Models and How DDPM Works

Diffusion models have emerged as a groundbreaking approach in the landscape of deep generative models, offering a robust alternative to traditional methods like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). These models are based on a stochastic process that gradually transforms data into a distribution of pure noise and then learns to reverse this process to generate new samples from the noise. One of the most prominent types of diffusion models is the Denoising Diffusion Probabilistic Model (DDPM), which has shown remarkable success in generating high-quality samples across various domains.

Understanding Diffusion Models

At their core, diffusion models operate through two phases: the forward diffusion (or noising) phase and the reverse diffusion (or denoising) phase. The forward diffusion phase incrementally adds Gaussian noise to the data over a series of steps, eventually converting the data into a distribution that resembles Gaussian noise. Mathematically, this process is modeled as a Markov chain, where each step adds a small amount of noise to the data, following a predefined schedule.

The reverse diffusion phase, on the other hand, is where the generative modeling comes into play. The model learns to reverse the noising process, starting from the noise distribution and progressively denoising it to recover the original data distribution. This phase is where the deep learning model, typically a neural network, is trained to predict the noise that was added at each step of the forward process and remove it, effectively learning the data distribution.

Denoising Diffusion Probabilistic Models (DDPMs)

DDPMs are a specific instantiation of diffusion models that frame the generative process as a probabilistic model. The key insight behind DDPMs is to model the reverse diffusion process as a sequence of conditional probability distributions, each predicting the clean data at a previous timestep given the noised data at the current timestep.

Technical Details of DDPMs

  1. Forward Process: The forward process in DDPM is defined as a sequence of diffusion steps that gradually add noise to the data. This can be formally expressed as:
    $[x_{t} = \sqrt{\alpha_t}x_{t-1} + \sqrt{1 - \alpha_t}\epsilon,]$
    where (x_{t-1}) is the data at the previous timestep, (\epsilon) is sampled from a standard Gaussian distribution, and (\alpha_t) is a coefficient that determines the amount of noise added at each step. The sequence of (\alpha_t) values is chosen to ensure that the data becomes more noised in a controlled manner.
  2. Reverse Process: The reverse process is where the model learns to generate data. At each step, the model predicts the noise $(\epsilon)$ that was added in the forward process and uses this prediction to compute the denoised data at the previous timestep. The prediction is made by a neural network, typically parameterized as $(\epsilon_\theta(x_t, t))$, where (x_t) is the noised data at timestep (t), and $(\theta)$ represents the network parameters.
  3. Training Objective: The training of a DDPM is framed as minimizing the difference between the predicted noise and the actual noise added during the forward process. This can be formulated as a variant of the mean squared error loss over the noise predictions, encouraging the model to accurately infer the reverse diffusion process.
  4. Sampling: To generate new samples, DDPM starts with noise sampled from a Gaussian distribution and iteratively applies the reverse process. At each step, it uses the trained model to predict and subtract the noise, gradually denoising the sample until it resembles a sample from the target data distribution.

Another type of diffusion model is Score-Based Generative Models (SGMs), also known as score matching models, which present a novel and powerful framework within the realm of diffusion models for generating high-quality data samples. Unlike traditional generative models that directly model the data distribution or the transformation from noise to data, SGMs focus on modeling the gradient of the log probability density (score) of the data distribution. This subtle yet profound shift in approach allows SGMs to generate samples through a process that iteratively refines noise into structured data, guided by the modeled score function.

Foundations of Score-Based Modeling

The concept of score-based generative modeling is grounded in the mathematical definition of the score, which is the gradient of the log probability density with respect to the data.

This score function captures the direction in which the probability density increases, providing a way to gradually adjust samples to more likely configurations under the data distribution.

How SGMs Work

SGMs operate by learning an estimate of the score function of the data distribution across different noise levels. This learning process is facilitated by gradually adding noise to the data, creating a family of distributions that interpolate between the data distribution and a known noise distribution. The model then learns to approximate the score of these intermediate distributions.

  1. Noise Addition: Noise is added to the data in a controlled manner to create a sequence of progressively noisier versions of the data. This sequence spans from the original data distribution to a tractable noise distribution, typically Gaussian.
  2. Score Network: A neural network, referred to as the score network, is trained to estimate the score of the data at each noise level. This involves minimizing the difference between the estimated score and the true score of the noised data, a process known as denoising score matching.
  3. Sampling via Langevin Dynamics: To generate samples, SGMs use a technique called Langevin dynamics, which iteratively refines samples by moving them in the direction of the estimated score while simultaneously injecting a small amount of noise. This process starts from pure noise and progressively denoises it to generate samples from the target distribution.

Training SGMs

The training of SGMs involves optimizing the parameters of the scoring network to accurately estimate the score across different noise levels. This is typically achieved through a loss function that measures the discrepancy between the estimated score and the true score, with common choices including the Fisher divergence or the Hyvärinen score.

Conclusion

DDPMs represent a significant advancement in generative modeling, providing a powerful framework for understanding and manipulating complex data distributions. By leveraging a stochastic process that models the addition and removal of noise, DDPMs can generate high-quality, diverse samples from virtually any data distribution. The mathematical elegance and practical effectiveness of DDPMs underscore their foundational role in the ongoing evolution of diffusion models.