24 Generative Models and GANs

Learning Data Structure

The primary goal of generative modelling is to understand the underlying structure of high-dimensional data, such as images

Data as Variables: An image is treated as a high-dimensional random variable $x \in R^{d \times d}$
Complexity: Image follows a “complicated” distribution $p (x)$ . Because most random configurations of pixels look like noise, we aim to learn a compressed representation that captures the meaningful structure of the data.
The Generative Task: Use a neural network to approximate the true distribution $p (x) \approx \overset{p}{^}_{θ} (x)$ . Once learned, we can sample (generate) new data points: $x \sim \overset{p}{^}_{θ} (x)$

Density Estimation vs. Sampling

Density Estimation: Fitting a probabilistic model $p_{θ} (x)$ to data to learn its parameters $θ$ such that $p_{θ} (x) \approx p_{data} (x)$
Sampling: Training a system that allows us to generate new samples that look like they came from the training set.
Challenges: Data is extremely high-dimensional and dimensions are correlated in complex, non-linear ways ( $p (x) \neq = \prod p (x_{i})$ )

Generative Adversarial Networks (GANs)

Introduced by Goodfellow et al. (2014), GANs use a game-theoretic approach to generative modelling.

Architecture

The system consists of two networks competing against each other.

The Generator $(G_{θ})$ : A “counterfeiter” that learns to map a latent noise vector $z$ (from a simple prior distribution like $N (0, I)$ to the data space).
- Goal: Create “fake” samples that are indistinguishable from real data.
The Discriminator $(D_{ϕ})$ : A “detective” or binary classifier.
- Goal: Predict the probability that an input $x$ is “real” (from the training set) rather than “fake” (from the generator).
- $D_{ϕ} (x) = 1$ : Certain the input is real
- $D_{ϕ} (x) = 0$ : Certain the input is fake

The Min-Max Game

The training is Two-player Min-Max game. The objective function $J_{GAN} (θ, ϕ)$ is defined as:

θ min ϕ max V (D_{ϕ}, G_{θ}) = E_{x \sim p_{d a t a} (x)} [lo g D_{ϕ} (x)] + E_{z \sim p (z)} [lo g (1 - D_{ϕ} (G_{θ} (z)))]

Discriminator Objective: Maximise the probability of assigning the correct label to both real and fake samples.
Generator Objective: Minimise the probability that the Discriminator identifies its samples as fake.

Training GANs: Theory vs. Practice

Theoretical Algorithm

Train $D$ for $K$ iterations (usually $K$ = 1) to maximise the log-likelihood of correctly classifying real vs. fake.
Train $G$ for $1$ iteration to maximise the probability that $D$ correctly identifies fake samples.

Practical Generator Loss

In early training, $G$ is often very poor, and $D$ can reject fake samples with high confidence. This leads to vanishing gradients for theoretical loss.

Theoretical Loss: $min_{θ} E_{z \sim p (z)} [lo g (1 - D_{ϕ} (G_{θ} (z)))]$
Practical Loss: $min_{θ} E_{z \sim p (z)} [- lo g D_{ϕ} (G_{θ} (z))]$
- This provides much stronger gradients early in training.

Theoretical Optimality

Optimal Discriminator $(D^{⋆})$ : For a fixed Generator, the optimal Discriminator is:

D^{*} (x) = \frac{p _{d a t a} ( x )}{p _{d a t a} ( x ) + p _{g} ( x )}

Global Optimum: When $p_{g} (x) = p_{data} (x)$ , the optimal Discriminator is $D^{⋆} (x) = 0.5$ everywhere, meaning it can no longer distinguish between real and fake.
Divergence Minimisation: Theoretically, the GAN objective minimised the Jensen-Shannon(JS) Divergence between the data distribution and the model distribution.

Conditional Generative Models (cGANs)

While basic GANs learn the marginal distribution $p (y)$ , Conditional GANs learn the posterior distribution $p (y ∣ x)$ .

Application: Image-to-image translation (e.g., turning a sketch into a photo, restoration, or un-cropping).
Why cGANS over Regression (MSE) ?
- Regression with Mean Squared Error (MSE) finds the Expected Value ( $E [y ∣ x]$ ), which acts as a “compromise” between all possible outcomes. this results in blurry images.
- Conditional models account for uncertainty and can sample from the full distribution, producing sharper, more realistic results.

Evolution of GANs

GAN (2014): Original multilayer perceptron approach.
DCGAN (2016): Deep Convolutional GANs. Introduced architectural constraints (e.g., removing pooling, using strided convolutions) to make training stable.
Allows Latent Space Interpolation: Decoding values of $z$ between two points results in “smooth” transitions between generated images.
BigGAN (2019): Scaling GANs for high-fidelity natural image synthesis at high resolutions (e.g., $512 \times 512$ ).

Ayush Acharjya's Notes

Explorer