33. ν™•μ‚° λͺ¨λΈ(Diffusion Models, DDPM)

이전: Diffusion Models | λ‹€μŒ: CLIPκ³Ό λ©€ν‹°λͺ¨λ‹¬ ν•™μŠ΅


33. ν™•μ‚° λͺ¨λΈ(Diffusion Models, DDPM)

κ°œμš”

디노이징 ν™•μ‚° ν™•λ₯  λͺ¨λΈ(Denoising Diffusion Probabilistic Models, DDPM)은 점진적인 λ…Έμ΄μ¦ˆ μΆ”κ°€ 과정을 μ—­μ „μ‹œμΌœ 데이터λ₯Ό μƒμ„±ν•˜λŠ” κ°•λ ₯ν•œ 생성 λͺ¨λΈμž…λ‹ˆλ‹€. "Denoising Diffusion Probabilistic Models" (Ho et al., 2020)


μˆ˜ν•™μ  λ°°κ²½

1. 순방ν–₯ ν™•μ‚° κ³Όμ •(Forward Diffusion Process)

λͺ©ν‘œ: 데이터 x₀에 μ μ§„μ μœΌλ‘œ κ°€μš°μ‹œμ•ˆ λ…Έμ΄μ¦ˆ μΆ”κ°€

q(xβ‚œ|xβ‚œβ‚‹β‚) = N(xβ‚œ; √(1-Ξ²β‚œ)xβ‚œβ‚‹β‚, Ξ²β‚œI)

μ—¬κΈ°μ„œ:
- xβ‚€: 원본 데이터
- xβ‚œ: νƒ€μž„μŠ€ν… tμ—μ„œμ˜ λ…Έμ΄μ¦ˆκ°€ μžˆλŠ” 데이터
- Ξ²β‚œ: λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„ (β₁, ..., Ξ²β‚œ)
- T: 전체 νƒ€μž„μŠ€ν… (일반적으둜 1000)

λ‹«νžŒ ν˜•μ‹(Closed form) (Ξ±β‚œ = 1 - Ξ²β‚œ, αΎ±β‚œ = βˆα΅’β‚Œβ‚α΅— Ξ±α΅’ μ‚¬μš©):
q(xβ‚œ|xβ‚€) = N(xβ‚œ; βˆšαΎ±β‚œ xβ‚€, (1-αΎ±β‚œ)I)

xβ‚œ = βˆšαΎ±β‚œ xβ‚€ + √(1-αΎ±β‚œ) Ξ΅,  Ξ΅ ~ N(0, I)

t β†’ T일 λ•Œ: xβ‚œ β†’ N(0, I) (순수 λ…Έμ΄μ¦ˆ)

2. μ—­λ°©ν–₯ ν™•μ‚° κ³Όμ •(Reverse Diffusion Process)

λͺ©ν‘œ: 디노이징 p(xβ‚œβ‚‹β‚|xβ‚œ) ν•™μŠ΅

μ‹€μ œ 사후 뢄포(Intractable):
q(xβ‚œβ‚‹β‚|xβ‚œ, xβ‚€) = N(xβ‚œβ‚‹β‚; ΞΌΜƒβ‚œ(xβ‚œ, xβ‚€), Ξ²Μƒβ‚œI)

μ—¬κΈ°μ„œ:
ΞΌΜƒβ‚œ(xβ‚œ, xβ‚€) = (βˆšαΎ±β‚œβ‚‹β‚ Ξ²β‚œ)/(1-αΎ±β‚œ) xβ‚€ + (βˆšΞ±β‚œ(1-αΎ±β‚œβ‚‹β‚))/(1-αΎ±β‚œ) xβ‚œ
Ξ²Μƒβ‚œ = (1-αΎ±β‚œβ‚‹β‚)/(1-αΎ±β‚œ) Β· Ξ²β‚œ

ν•™μŠ΅λœ μ—­λ°©ν–₯ κ³Όμ •:
pΞΈ(xβ‚œβ‚‹β‚|xβ‚œ) = N(xβ‚œβ‚‹β‚; ΞΌΞΈ(xβ‚œ, t), Σθ(xβ‚œ, t))

λ‹¨μˆœν™”: 평균 λŒ€μ‹  λ…Έμ΄μ¦ˆ Ξ΅ 예츑
Ρθ(xβ‚œ, t) β‰ˆ Ξ΅

3. ν•™μŠ΅ λͺ©μ  ν•¨μˆ˜(Training Objective)

λ³€λΆ„ ν•˜ν•œ(Variational Lower Bound, ELBO):
L = Eβ‚œ,xβ‚€,Ξ΅[||Ξ΅ - Ρθ(xβ‚œ, t)||Β²]

μ—¬κΈ°μ„œ:
- t ~ Uniform(1, T)
- xβ‚€ ~ q(xβ‚€)
- Ξ΅ ~ N(0, I)
- xβ‚œ = βˆšαΎ±β‚œ xβ‚€ + √(1-αΎ±β‚œ) Ξ΅

예츑된 λ…Έμ΄μ¦ˆμ— λŒ€ν•œ λ‹¨μˆœν•œ MSE 손싀!

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ν•™μŠ΅:                                  β”‚
β”‚  1. xβ‚€, t, Ξ΅ μƒ˜ν”Œλ§                     β”‚
β”‚  2. xβ‚œ = βˆšαΎ±β‚œ xβ‚€ + √(1-αΎ±β‚œ) Ξ΅ 생성       β”‚
β”‚  3. Ξ΅Μ‚ = Ρθ(xβ‚œ, t) 예츑                 β”‚
β”‚  4. 손싀 = ||Ξ΅ - Ξ΅Μ‚||Β²                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4. μƒ˜ν”Œλ§(생성, Sampling/Generation)

xβ‚œ ~ N(0, I)μ—μ„œ μ‹œμž‘

t = T, T-1, ..., 1에 λŒ€ν•΄:
    z ~ N(0, I) (t > 1일 λ•Œ), κ·Έλ ‡μ§€ μ•ŠμœΌλ©΄ z = 0

    Ξ΅Μ‚ = Ρθ(xβ‚œ, t)

    xβ‚œβ‚‹β‚ = 1/βˆšΞ±β‚œ (xβ‚œ - (1-Ξ±β‚œ)/√(1-αΎ±β‚œ) Ξ΅Μ‚) + Οƒβ‚œz

μ—¬κΈ°μ„œ:
Οƒβ‚œ = βˆšΞ²Μƒβ‚œ λ˜λŠ” βˆšΞ²β‚œ (λΆ„μ‚° μŠ€μΌ€μ€„)

μ΅œμ’…: xβ‚€κ°€ μƒμ„±λœ μƒ˜ν”Œ

DDPM μ•„ν‚€ν…μ²˜

μ‹œκ°„ μž„λ² λ”©μ„ κ°–λŠ” UNet(UNet with Time Embedding)

μ‹œκ°„ μž„λ² λ”©(Sinusoidal Positional Encoding):
t (슀칼라)
    ↓
PE(t, dim) = [sin(t/10000^(0/d)), cos(t/10000^(0/d)),
              sin(t/10000^(2/d)), cos(t/10000^(2/d)), ...]
    ↓
Linear(dim→4*dim) + SiLU + Linear(4*dim→4*dim)
    ↓
time_emb (곡간 μ°¨μ›μœΌλ‘œ λΈŒλ‘œλ“œμΊμŠ€νŠΈ)


UNet ꡬ쑰 (예: 32Γ—32Γ—3 이미지):

μž…λ ₯ xβ‚œ (32Γ—32Γ—3) + time_emb
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  인코더 (λ‹€μš΄μƒ˜ν”Œλ§)                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Conv(3β†’64) + TimeEmb + ResBlock         β”‚ β†’ skip1
β”‚     ↓ Downsample                        β”‚
β”‚ Conv(64β†’128) + TimeEmb + ResBlock       β”‚ β†’ skip2
β”‚     ↓ Downsample                        β”‚
β”‚ Conv(128β†’256) + TimeEmb + ResBlock      β”‚ β†’ skip3
β”‚     ↓ Downsample                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  병λͺ©μΈ΅(Bottleneck)                     β”‚
β”‚  Conv(256β†’512) + Attention + ResBlock   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  디코더 (μ—…μƒ˜ν”Œλ§)                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚     ↑ Upsample + Concat(skip3)          β”‚
β”‚ Conv(512+256β†’256) + TimeEmb + ResBlock  β”‚
β”‚     ↑ Upsample + Concat(skip2)          β”‚
β”‚ Conv(256+128β†’128) + TimeEmb + ResBlock  β”‚
β”‚     ↑ Upsample + Concat(skip1)          β”‚
β”‚ Conv(128+64β†’64) + TimeEmb + ResBlock    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
Conv(64β†’3) + GroupNorm
    ↓
좜λ ₯ Ρθ(xβ‚œ, t) (32Γ—32Γ—3)

μ‹œκ°„ μž„λ² λ”©μ„ κ°–λŠ” ResBlock(ResBlock with Time Embedding)

x, time_emb β†’ ResBlock β†’ out

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GroupNorm β†’ SiLU β†’ Conv                β”‚
β”‚       ↓                                 β”‚
β”‚  + time_emb (λΈŒλ‘œλ“œμΊμŠ€νŠΈ)              β”‚
β”‚       ↓                                 β”‚
β”‚  GroupNorm β†’ SiLU β†’ Conv                β”‚
β”‚       ↓                                 β”‚
β”‚  + skip connection (ν”„λ‘œμ μ…˜ 포함)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„(Noise Schedule)

μ„ ν˜• μŠ€μΌ€μ€„(Linear Schedule)

# μ„ ν˜• μŠ€μΌ€μ€„ (Ho et al., 2020)
β₁ = 1e-4
Ξ²β‚œ = 0.02
Ξ²β‚œ = linear_interpolate(β₁, Ξ²β‚œ, t/T)

# νš¨μœ¨μ„±μ„ μœ„ν•œ 사전 계산
Ξ±β‚œ = 1 - Ξ²β‚œ
αΎ±β‚œ = βˆα΅’β‚Œβ‚α΅— Ξ±α΅’
βˆšαΎ±β‚œ, √(1-αΎ±β‚œ)  # 순방ν–₯ κ³Όμ •μ—μ„œ μ‚¬μš©

코사인 μŠ€μΌ€μ€„(κ°œμ„ λœ 버전, Cosine Schedule - Improved)

# 코사인 μŠ€μΌ€μ€„ (Nichol & Dhariwal, 2021)
s = 0.008
f(t) = cosΒ²((t/T + s)/(1 + s) Β· Ο€/2)
αΎ±β‚œ = f(t) / f(0)
Ξ²β‚œ = 1 - Ξ±β‚œ/Ξ±β‚œβ‚‹β‚

# 더 λΆ€λ“œλŸ¬μš΄ λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„, 고해상도에 더 적합

파일 ꡬ쑰

13_Diffusion/
β”œβ”€β”€ README.md
β”œβ”€β”€ pytorch_lowlevel/
β”‚   β”œβ”€β”€ ddpm_mnist.py         # MNIST용 DDPM (28Γ—28)
β”‚   └── ddpm_cifar.py         # CIFAR-10용 DDPM (32Γ—32)
β”œβ”€β”€ paper/
β”‚   β”œβ”€β”€ ddpm_paper.py         # 전체 DDPM κ΅¬ν˜„
β”‚   β”œβ”€β”€ ddim_sampling.py      # DDIM λΉ λ₯Έ μƒ˜ν”Œλ§
β”‚   └── cosine_schedule.py    # κ°œμ„ λœ λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„
└── exercises/
    β”œβ”€β”€ 01_noise_schedule.md  # λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„ μ‹œκ°ν™”
    └── 02_sampling_steps.md  # DDPM vs DDIM 비ꡐ

핡심 κ°œλ…

1. DDPM vs DDIM μƒ˜ν”Œλ§

DDPM (Ho et al., 2020):
- ν™•λ₯ μ  μƒ˜ν”Œλ§(각 λ‹¨κ³„μ—μ„œ λ…Έμ΄μ¦ˆ z μΆ”κ°€)
- T 단계 ν•„μš” (예: 1000 단계)
- κ³ ν’ˆμ§ˆμ΄μ§€λ§Œ 느림

DDIM (Song et al., 2020):
- 결정적 μƒ˜ν”Œλ§ (z = 0)
- νƒ€μž„μŠ€ν… κ±΄λ„ˆλ›°κΈ°: λΆ€λΆ„μ§‘ν•© μ‚¬μš© [τ₁, Ο„β‚‚, ..., Ο„β‚›]
- 10-50λ°° 빠름 (예: 50 단계)
- ν’ˆμ§ˆ μ•½κ°„ μ €ν•˜

DDIM μ—…λ°μ΄νŠΈ:
xβ‚œβ‚‹β‚ = βˆšαΎ±β‚œβ‚‹β‚ xΜ‚β‚€ + √(1-αΎ±β‚œβ‚‹β‚) Ρθ(xβ‚œ, t)

μ—¬κΈ°μ„œ xΜ‚β‚€ = (xβ‚œ - √(1-αΎ±β‚œ)Ρθ(xβ‚œ, t))/βˆšαΎ±β‚œ

2. λΆ„λ₯˜κΈ° κ°€μ΄λ˜μŠ€(Classifier Guidance)

λͺ©ν‘œ: 클래슀 y에 μ‘°κ±΄ν™”λœ μƒ˜ν”Œ 생성

쑰건뢀 μŠ€μ½”μ–΄:
βˆ‡β‚“ log p(xβ‚œ|y) β‰ˆ βˆ‡β‚“ log p(xβ‚œ) + sΒ·βˆ‡β‚“ log p(y|xβ‚œ)
                  ─────────────   ─────────────────
                  무쑰건뢀         λΆ„λ₯˜κΈ° κ·Έλž˜λ””μ–ΈνŠΈ

κ°€μ΄λ“œλœ λ…Έμ΄μ¦ˆ 예츑:
Ξ΅Μ‚ = Ρθ(xβ‚œ, t) - s·√(1-αΎ±β‚œ)Β·βˆ‡β‚“ log pΟ†(y|xβ‚œ)

s: κ°€μ΄λ˜μŠ€ μŠ€μΌ€μΌ (s > 1 β†’ 더 κ°•ν•œ 쑰건화)

3. λΆ„λ₯˜κΈ° 프리 κ°€μ΄λ˜μŠ€(Classifier-Free Guidance)

λ³„λ„μ˜ λΆ„λ₯˜κΈ° λΆˆν•„μš”!

쑰건뢀와 무쑰건뢀 λͺ¨λ‘ μ²˜λ¦¬ν•˜λ„λ‘ λͺ¨λΈ ν•™μŠ΅:
Ρθ(xβ‚œ, t, c) (ν™•λ₯  p둜)
Ρθ(xβ‚œ, t, βˆ…) (ν™•λ₯  1-p둜) (βˆ… = 널 클래슀)

κ°€μ΄λ“œλœ 예츑:
Ξ΅Μ‚ = Ρθ(xβ‚œ, t, βˆ…) + wΒ·(Ρθ(xβ‚œ, t, c) - Ρθ(xβ‚œ, t, βˆ…))

w: κ°€μ΄λ˜μŠ€ κ°€μ€‘μΉ˜ (w=0 β†’ 무쑰건뢀, w>1 β†’ 더 강함)

μ‚¬μš©μ²˜: Stable Diffusion, DALL-E 2, Imagen

4. ν•™μŠ΅ 팁

1. EMA (μ§€μˆ˜ 이동 평균, Exponential Moving Average):
   - ΞΈ_ema = 0.9999Β·ΞΈ_ema + 0.0001Β·ΞΈ μœ μ§€
   - μƒ˜ν”Œλ§μ— ΞΈ_ema μ‚¬μš©

2. 점진적 ν•™μŠ΅(Progressive Training):
   - μž‘μ€ ν•΄μƒλ„λ‘œ μ‹œμž‘
   - μ μ§„μ μœΌλ‘œ 증가 (8Γ—8 β†’ 16Γ—16 β†’ 32Γ—32)

3. 데이터 증강:
   - λ¬΄μž‘μœ„ μˆ˜ν‰ λ’€μ§‘κΈ°
   - [-1, 1]둜 μ •κ·œν™”

4. ν•™μŠ΅λ₯ :
   - MNIST/CIFAR: 2e-4
   - 고해상도: 1e-4

5. 배치 크기:
   - μž‘μ€ 이미지: 128-256
   - 큰 이미지: 32-64

κ΅¬ν˜„ 레벨

레벨 2: PyTorch 둜우레벨 (pytorch_lowlevel/)

  • 순방ν–₯/μ—­λ°©ν–₯ ν™•μ‚° κ΅¬ν˜„
  • λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„(μ„ ν˜•) κ΅¬ν˜„
  • μ‹œκ°„ μž„λ² λ”©μ΄ μžˆλŠ” UNet ꡬ좕
  • MNIST (28Γ—28) 및 CIFAR-10 (32Γ—32)μ—μ„œ ν•™μŠ΅

레벨 3: λ…Όλ¬Έ κ΅¬ν˜„ (paper/)

  • 코사인 μŠ€μΌ€μ€„μ„ κ°–λŠ” 전체 DDPM
  • DDIM μƒ˜ν”Œλ§ (λΉ λ₯Έ μΆ”λ‘ )
  • λΆ„λ₯˜κΈ° 프리 κ°€μ΄λ˜μŠ€
  • FID/IS 평가 λ©”νŠΈλ¦­

ν•™μŠ΅ 루프

# μ˜μ‚¬μ½”λ“œ
for epoch in epochs:
    for x0, _ in dataloader:
        # λ¬΄μž‘μœ„ νƒ€μž„μŠ€ν… μƒ˜ν”Œλ§
        t = torch.randint(1, T+1, (batch_size,))

        # λ…Έμ΄μ¦ˆ μƒ˜ν”Œλ§
        noise = torch.randn_like(x0)

        # 순방ν–₯ ν™•μ‚°: λ…Έμ΄μ¦ˆκ°€ μžˆλŠ” 이미지 생성
        xt = sqrt_alpha_bar[t] * x0 + sqrt_one_minus_alpha_bar[t] * noise

        # λ…Έμ΄μ¦ˆ 예츑
        noise_pred = model(xt, t)

        # MSE 손싀
        loss = F.mse_loss(noise_pred, noise)

        # μ—­μ „νŒŒ
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

μƒ˜ν”Œλ§ 루프

# DDPM μƒ˜ν”Œλ§
x = torch.randn(batch_size, 3, 32, 32)  # λ…Έμ΄μ¦ˆμ—μ„œ μ‹œμž‘

for t in reversed(range(1, T+1)):
    # λ…Έμ΄μ¦ˆ 예츑
    t_batch = torch.full((batch_size,), t)
    noise_pred = model(x, t_batch)

    # 평균 계산
    alpha_t = alpha[t]
    alpha_bar_t = alpha_bar[t]
    mean = (x - (1 - alpha_t) / sqrt(1 - alpha_bar_t) * noise_pred) / sqrt(alpha_t)

    # λ…Έμ΄μ¦ˆ μΆ”κ°€ (λ§ˆμ§€λ§‰ 단계 μ œμ™Έ)
    if t > 1:
        noise = torch.randn_like(x)
        sigma_t = sqrt(beta[t])
        x = mean + sigma_t * noise
    else:
        x = mean

# xλŠ” μƒμ„±λœ 이미지

ν•™μŠ΅ 체크리슀트

  • [ ] 순방ν–₯ ν™•μ‚° λ‹«νžŒ ν˜•μ‹ 이해
  • [ ] ELBOμ—μ„œ μ—­λ°©ν–₯ ν™•μ‚° μœ λ„
  • [ ] λ…Έμ΄μ¦ˆ μŠ€μΌ€μ€„ κ΅¬ν˜„ (μ„ ν˜•, 코사인)
  • [ ] μ‹œκ°„ μž„λ² λ”©μ΄ μžˆλŠ” UNet ꡬ좕
  • [ ] DDPM vs DDIM μƒ˜ν”Œλ§ 이해
  • [ ] λΆ„λ₯˜κΈ° 프리 κ°€μ΄λ˜μŠ€ κ΅¬ν˜„
  • [ ] 평가λ₯Ό μœ„ν•œ FID μŠ€μ½”μ–΄ 계산

μ°Έκ³  λ¬Έν—Œ

  • Ho et al. (2020). "Denoising Diffusion Probabilistic Models"
  • Song et al. (2020). "Denoising Diffusion Implicit Models"
  • Nichol & Dhariwal (2021). "Improved Denoising Diffusion Probabilistic Models"
  • Ho & Salimans (2022). "Classifier-Free Diffusion Guidance"
  • 32_Diffusion_Models.md
to navigate between lessons