Learning Data Structure

The primary goal of generative modelling is to understand the underlying structure of high-dimensional data, such as images

  • Data as Variables: An image is treated as a high-dimensional random variable
  • Complexity: Image follows a “complicated” distribution . Because most random configurations of pixels look like noise, we aim to learn a compressed representation that captures the meaningful structure of the data.
  • The Generative Task: Use a neural network to approximate the true distribution . Once learned, we can sample (generate) new data points:

Density Estimation vs. Sampling

  • Density Estimation: Fitting a probabilistic model to data to learn its parameters such that
  • Sampling: Training a system that allows us to generate new samples that look like they came from the training set.
  • Challenges: Data is extremely high-dimensional and dimensions are correlated in complex, non-linear ways ()

Generative Adversarial Networks (GANs)

Introduced by Goodfellow et al. (2014), GANs use a game-theoretic approach to generative modelling.

Architecture

The system consists of two networks competing against each other.

  1. The Generator : A “counterfeiter” that learns to map a latent noise vector (from a simple prior distribution like to the data space).
    • Goal: Create “fake” samples that are indistinguishable from real data.
  2. The Discriminator : A “detective” or binary classifier.
    • Goal: Predict the probability that an input is “real” (from the training set) rather than “fake” (from the generator).
    • : Certain the input is real
    • : Certain the input is fake

The Min-Max Game

The training is Two-player Min-Max game. The objective function is defined as:

  • Discriminator Objective: Maximise the probability of assigning the correct label to both real and fake samples.
  • Generator Objective: Minimise the probability that the Discriminator identifies its samples as fake.

Training GANs: Theory vs. Practice

Theoretical Algorithm

  1. Train for iterations (usually = 1) to maximise the log-likelihood of correctly classifying real vs. fake.
  2. Train for iteration to maximise the probability that correctly identifies fake samples.

Practical Generator Loss

In early training, is often very poor, and can reject fake samples with high confidence. This leads to vanishing gradients for theoretical loss.

  • Theoretical Loss:
  • Practical Loss:
    • This provides much stronger gradients early in training.

Theoretical Optimality

  • Optimal Discriminator : For a fixed Generator, the optimal Discriminator is:
  • Global Optimum: When , the optimal Discriminator is everywhere, meaning it can no longer distinguish between real and fake.
  • Divergence Minimisation: Theoretically, the GAN objective minimised the Jensen-Shannon(JS) Divergence between the data distribution and the model distribution.

Conditional Generative Models (cGANs)

While basic GANs learn the marginal distribution , Conditional GANs learn the posterior distribution .

  • Application: Image-to-image translation (e.g., turning a sketch into a photo, restoration, or un-cropping).
  • Why cGANS over Regression (MSE) ?
    • Regression with Mean Squared Error (MSE) finds the Expected Value (), which acts as a “compromise” between all possible outcomes. this results in blurry images.
    • Conditional models account for uncertainty and can sample from the full distribution, producing sharper, more realistic results.

Evolution of GANs

  • GAN (2014): Original multilayer perceptron approach.
  • DCGAN (2016): Deep Convolutional GANs. Introduced architectural constraints (e.g., removing pooling, using strided convolutions) to make training stable.
  • Allows Latent Space Interpolation: Decoding values of between two points results in “smooth” transitions between generated images.
  • BigGAN (2019): Scaling GANs for high-fidelity natural image synthesis at high resolutions (e.g., ).