29. Generative Adversarial Networks (GAN)
Previous: Generative Models - GAN | Next: Generative Models - VAE
29. Generative Adversarial Networks (GAN)¶
Overview¶
Generative Adversarial Networks (GAN) learn to generate realistic data through an adversarial game between a Generator and a Discriminator. "Generative Adversarial Networks" (Goodfellow et al., 2014)
Mathematical Background¶
1. Minimax Game¶
Goal: Generator G fools Discriminator D
Minimax objective:
min_G max_D V(D, G) = Eβ~pdata[log D(x)] + Ez~pz[log(1 - D(G(z)))]
ββββββββββββββββββ ββββββββββββββββββββββ
Real data Fake data
(maximize D(x)β1) (maximize D(G(z))β0)
D's goal: Maximize V (distinguish real from fake)
G's goal: Minimize V (fool D into thinking fake is real)
Optimal discriminator:
D*(x) = pdata(x) / (pdata(x) + pg(x))
At Nash equilibrium: pg = pdata, D*(x) = 1/2
2. Training Dynamics¶
Alternating optimization:
Step 1: Update D (fix G)
Maximize log D(x) + log(1 - D(G(z)))
β D learns to classify real vs fake
Step 2: Update G (fix D)
Minimize log(1 - D(G(z)))
or Maximize log D(G(z)) β Non-saturating variant (better gradient)
Why maximize log D(G(z))?
- Early training: G is bad β D(G(z)) β 0
- log(1 - D(G(z))) β 0 β vanishing gradient
- log D(G(z)) provides stronger gradient
βββββββββββββββββββββββββββββββββββββββββββ
β GAN Training Loop: β
β β
β for epoch in epochs: β
β for real_batch in dataloader: β
β # 1. Update Discriminator β
β z ~ N(0, I) β
β fake = G(z) β
β loss_D = -log D(real) - log(1-D(fake))
β D.step() β
β β
β # 2. Update Generator β
β z ~ N(0, I) β
β fake = G(z) β
β loss_G = -log D(fake) β
β G.step() β
βββββββββββββββββββββββββββββββββββββββββββ
3. Loss Functions¶
Original GAN (Minimax):
L_D = -[log D(x) + log(1 - D(G(z)))]
L_G = -log D(G(z)) (non-saturating)
WGAN (Wasserstein GAN):
L_D = -[D(x) - D(G(z))] (no sigmoid, critic instead)
L_G = -D(G(z))
+ Weight clipping or Gradient Penalty
LSGAN (Least Squares GAN):
L_D = (D(x) - 1)Β² + D(G(z))Β²
L_G = (D(G(z)) - 1)Β²
Hinge Loss (Spectral Norm GAN):
L_D = -min(0, -1 + D(x)) - min(0, -1 - D(G(z)))
L_G = -D(G(z))
DCGAN Architecture¶
Deep Convolutional GAN (Radford et al., 2015) - Stable training guidelines
Generator (64Γ64 RGB images)¶
Latent Code z (100-dim)
β
Linear(100β4Γ4Γ1024) + BatchNorm + ReLU
β
Reshape β (4Γ4Γ1024)
β
ConvTranspose2d(1024β512, k=4, s=2, p=1) β (8Γ8Γ512)
β BatchNorm + ReLU
ConvTranspose2d(512β256, k=4, s=2, p=1) β (16Γ16Γ256)
β BatchNorm + ReLU
ConvTranspose2d(256β128, k=4, s=2, p=1) β (32Γ32Γ128)
β BatchNorm + ReLU
ConvTranspose2d(128β3, k=4, s=2, p=1) β (64Γ64Γ3)
β Tanh
Output (64Γ64Γ3, range [-1, 1])
Key design choices:
- No fully connected layers (except first projection)
- Use transposed convolutions for upsampling
- BatchNorm in all layers except output
- ReLU activation in G
- Tanh output (images normalized to [-1, 1])
Discriminator (64Γ64 RGB images)¶
Input (64Γ64Γ3)
β
Conv2d(3β128, k=4, s=2, p=1) β (32Γ32Γ128)
β LeakyReLU(0.2)
Conv2d(128β256, k=4, s=2, p=1) β (16Γ16Γ256)
β BatchNorm + LeakyReLU(0.2)
Conv2d(256β512, k=4, s=2, p=1) β (8Γ8Γ512)
β BatchNorm + LeakyReLU(0.2)
Conv2d(512β1024, k=4, s=2, p=1) β (4Γ4Γ1024)
β BatchNorm + LeakyReLU(0.2)
Conv2d(1024β1, k=4, s=1, p=0) β (1Γ1Γ1)
β Sigmoid (or remove for WGAN)
Output (scalar probability)
Key design choices:
- No fully connected layers (except implicit in final conv)
- Strided convolutions for downsampling (no pooling)
- BatchNorm in all layers except input/output
- LeakyReLU activation (Ξ±=0.2)
- Sigmoid output for binary classification
DCGAN Guidelines¶
1. Replace pooling with strided convolutions (D) / transposed convs (G)
2. Use BatchNorm in both G and D
3. Remove fully connected hidden layers
4. Use ReLU in G (except output: Tanh)
5. Use LeakyReLU in D (Ξ±=0.2)
Training Techniques¶
1. Label Smoothing¶
Problem: D becomes too confident (D(real)β1, D(fake)β0)
β Vanishing gradients for G
Solution: Smooth labels
Real labels: 1.0 β 0.9 (one-sided label smoothing)
Fake labels: 0.0 (keep as is)
loss_D_real = BCE(D(real), 0.9) # Instead of 1.0
loss_D_fake = BCE(D(fake), 0.0)
2. Feature Matching¶
Problem: G optimizes for fooling D, not generating realistic samples
Solution: Match intermediate features
loss_G = ||E[f(x)] - E[f(G(z))]||Β²
Where f(Β·) is intermediate layer of D
Stabilizes training, reduces mode collapse
3. Minibatch Discrimination¶
Problem: G produces limited variety (mode collapse)
Solution: Let D look at entire batch
1. Extract features from each sample
2. Compute similarity within batch
3. Append batch statistics to each sample
D can detect if G produces identical samples
4. Spectral Normalization¶
Problem: Discriminator gradients explode
Solution: Normalize weight matrices
W_SN = W / Ο(W)
Where Ο(W) is largest singular value
Stabilizes training (Miyato et al., 2018)
Used in BigGAN, StyleGAN
5. Progressive Growing¶
Start training at low resolution (4Γ4)
Gradually add layers to reach high resolution (1024Γ1024)
4Γ4 β 8Γ8 β 16Γ16 β ... β 1024Γ1024
Smooth transition between resolutions
Used in ProGAN, StyleGAN
Mode Collapse¶
1. What is Mode Collapse?¶
Problem: G produces limited variety
Modes of data distribution:
Mode 1 Mode 2 Mode 3
Real data: βββ βββ βββ
Healthy GAN: βββ βββ βββ
Mode collapse: βββ
Full collapse: βββββββ
Generator ignores parts of data distribution
2. Detecting Mode Collapse¶
Symptoms:
1. Generated samples look similar
2. Low diversity despite different z
3. Training loss oscillates
4. NLL (Negative Log-Likelihood) high despite low FID
Metrics:
- Inception Score (IS): measures diversity + quality
- FrΓ©chet Inception Distance (FID): distribution distance
- Precision/Recall: mode coverage
3. Mitigation Strategies¶
1. Unrolled GAN:
- G optimizes against future D (k steps ahead)
- Prevents G from exploiting current D
2. Minibatch Discrimination:
- D detects lack of diversity
3. WGAN / WGAN-GP:
- Smoother gradient flow
- Better training stability
4. Multiple Discriminators:
- Each D captures different modes
5. Regularization:
- Add noise to D inputs
- Dropout in D
File Structure¶
14_GAN/
βββ README.md
βββ pytorch_lowlevel/
β βββ dcgan_mnist.py # DCGAN on MNIST (28Γ28)
β βββ dcgan_cifar.py # DCGAN on CIFAR-10 (32Γ32)
βββ paper/
β βββ dcgan_paper.py # Full DCGAN (64Γ64)
β βββ wgan_gp.py # WGAN with Gradient Penalty
β βββ stylegan_simple.py # Simplified StyleGAN
β βββ conditional_gan.py # Conditional GAN (cGAN)
βββ exercises/
βββ 01_mode_collapse.md # Diagnose mode collapse
βββ 02_spectral_norm.md # Implement spectral normalization
Core Concepts¶
1. GAN Variants¶
Conditional GAN (cGAN):
- Add class label c: G(z, c), D(x, c)
- Controlled generation (e.g., generate digit 7)
WGAN (Wasserstein GAN):
- Replace JS divergence with Wasserstein distance
- No sigmoid in D (becomes critic)
- Weight clipping or gradient penalty
- More stable training
StyleGAN:
- Progressive architecture
- Style modulation (AdaIN)
- Disentangled latent space W
- State-of-the-art image quality
CycleGAN:
- Unpaired image-to-image translation
- Cycle consistency loss: G(F(x)) β x
- Use cases: horseβzebra, summerβwinter
2. Evaluation Metrics¶
Inception Score (IS):
IS = exp(E[KL(p(y|x) || p(y))])
Higher is better (quality + diversity)
Range: 1 to C (num classes)
FrΓ©chet Inception Distance (FID):
FID = ||ΞΌ_real - ΞΌ_fake||Β² + Tr(Ξ£_real + Ξ£_fake - 2β(Ξ£_realΒ·Ξ£_fake))
Lower is better (closer to real distribution)
Gold standard for GANs
Precision/Recall:
Precision = quality (fake samples are realistic)
Recall = coverage (all modes captured)
3. Training Tips¶
1. Learning rates:
- D: 2e-4 (Adam, Ξ²β=0.5, Ξ²β=0.999)
- G: 2e-4 (same optimizer settings)
2. Update frequency:
- 1 G update per 1-5 D updates
- D should be slightly ahead
3. Initialization:
- Xavier/He initialization
- BatchNorm parameters: Ξ³=1, Ξ²=0
4. Data:
- Normalize images to [-1, 1] (Tanh output)
- Random flip augmentation
5. Latent code:
- z ~ N(0, I), dim = 100-512
- Can use uniform U(-1, 1)
6. Monitoring:
- Log D(real), D(fake) (should hover around 0.5)
- Generate fixed z samples every epoch
- Watch for mode collapse (repetitive samples)
Implementation Levels¶
Level 2: PyTorch Low-Level (pytorch_lowlevel/)¶
- Build DCGAN architecture from scratch
- Implement alternating training loop
- Train on MNIST and CIFAR-10
- Visualize generated samples
Level 3: Paper Implementation (paper/)¶
- Full DCGAN with training tricks
- WGAN with Gradient Penalty
- Conditional GAN (class-conditional)
- Simplified StyleGAN with style modulation
- FID/IS evaluation
Training Loop¶
# Pseudocode
for epoch in epochs:
for real_images, _ in dataloader:
batch_size = real_images.size(0)
# ========== Train Discriminator ==========
# Real images
real_labels = torch.ones(batch_size, 1) * 0.9 # Label smoothing
output_real = D(real_images)
loss_D_real = BCE(output_real, real_labels)
# Fake images
z = torch.randn(batch_size, latent_dim)
fake_images = G(z)
fake_labels = torch.zeros(batch_size, 1)
output_fake = D(fake_images.detach()) # Detach to avoid G gradients
loss_D_fake = BCE(output_fake, fake_labels)
# Total D loss
loss_D = loss_D_real + loss_D_fake
optimizer_D.zero_grad()
loss_D.backward()
optimizer_D.step()
# ========== Train Generator ==========
z = torch.randn(batch_size, latent_dim)
fake_images = G(z)
output = D(fake_images)
real_labels = torch.ones(batch_size, 1)
loss_G = BCE(output, real_labels) # G wants D(fake)β1
optimizer_G.zero_grad()
loss_G.backward()
optimizer_G.step()
Sampling¶
# Generate new images
z = torch.randn(64, latent_dim) # Batch of 64
with torch.no_grad():
fake_images = G(z)
# Denormalize from [-1, 1] to [0, 1]
fake_images = (fake_images + 1) / 2
# For conditional GAN
labels = torch.randint(0, 10, (64,)) # Generate 64 samples from 10 classes
fake_images = G(z, labels)
Learning Checklist¶
- [ ] Understand minimax game formulation
- [ ] Implement DCGAN architecture guidelines
- [ ] Master alternating training loop
- [ ] Recognize mode collapse symptoms
- [ ] Implement label smoothing and spectral normalization
- [ ] Understand WGAN and gradient penalty
- [ ] Compute FID and Inception Score
- [ ] Implement conditional GAN
References¶
- Goodfellow et al. (2014). "Generative Adversarial Networks"
- Radford et al. (2015). "Unsupervised Representation Learning with Deep Convolutional GANs"
- Arjovsky et al. (2017). "Wasserstein GAN"
- Gulrajani et al. (2017). "Improved Training of Wasserstein GANs"
- Miyato et al. (2018). "Spectral Normalization for GANs"
- Karras et al. (2019). "A Style-Based Generator Architecture for GANs"
- ../Deep_Learning/15_GAN.md