31. Variational Autoencoder (VAE)
μ΄μ : μμ± λͺ¨λΈ - VAE | λ€μ: νμ° λͺ¨λΈ
31. Variational Autoencoder (VAE)¶
κ°μ¶
Variational Autoencoder (VAE)λ μμ± λͺ¨λΈμ κΈ°μ΄κ° λλ μν€ν μ²λ‘, λ°μ΄ν°μ μ μ¬ νν(latent representation)μ νμ΅νκ³ μλ‘μ΄ μνμ μμ±ν μ μμ΅λλ€. "Auto-Encoding Variational Bayes" (Kingma & Welling, 2013)
μνμ λ°°κ²½¶
1. μμ± λͺ¨λΈ λͺ©ν¶
λͺ©ν: p(x) λͺ¨λΈλ§
- x: κ΄μΈ‘ λ°μ΄ν° (μ΄λ―Έμ§ λ±)
- z: μ μ¬ λ³μ (latent variable)
μμ± κ³Όμ :
z ~ p(z) # Prior (λ³΄ν΅ N(0, I))
x ~ p(x|z) # Decoder/Generator
λ¬Έμ : p(x) = β« p(x|z)p(z)dz λ κ³μ° λΆκ°λ₯ (intractable)
2. Variational Inference¶
μ¬ν λΆν¬ p(z|x)λ κ³μ° λΆκ°λ₯
β κ·Όμ¬ λΆν¬ q(z|x)λ₯Ό νμ΅ (Encoder)
ELBO (Evidence Lower BOund):
log p(x) β₯ E_q[log p(x|z)] - KL(q(z|x) || p(z))
ββββββββββββββββ βββββββββββββββββββββ
Reconstruction Regularization
Loss (Prior matching)
μ΅λνν λͺ©ν:
L(ΞΈ, Ο; x) = E_q_Ο(z|x)[log p_ΞΈ(x|z)] - KL(q_Ο(z|x) || p(z))
3. Reparameterization Trick¶
λ¬Έμ : z ~ q(z|x) = N(ΞΌ, ΟΒ²) μμ μνλ§μ λ―ΈλΆ λΆκ°
ν΄κ²°: Reparameterization
Ξ΅ ~ N(0, I)
z = ΞΌ + Ο β Ξ΅
μ΄μ κ·ΈλλμΈνΈκ° ΞΌ, Ολ₯Ό ν΅ν΄ μμ ν κ°λ₯!
βββββββββββββββββββββββββββββββββββββββββββ
β Encoder β
β x β [ΞΌ, log ΟΒ²] β
β β
β Reparameterization β
β Ξ΅ ~ N(0, I) β
β z = ΞΌ + Ο β Ξ΅ β
β β
β Decoder β
β z β xΜ β
βββββββββββββββββββββββββββββββββββββββββββ
4. μμ€ ν¨μ¶
L = L_recon + Ξ² * L_KL
Reconstruction Loss (μ΄λ―Έμ§):
- Binary: BCE(x, xΜ) = -Ξ£[xΒ·log(xΜ) + (1-x)Β·log(1-xΜ)]
- Continuous: MSE(x, xΜ) = ||x - xΜ||Β²
KL Divergence (Gaussian prior):
KL(N(ΞΌ, ΟΒ²) || N(0, 1)) = -Β½ Ξ£(1 + log ΟΒ² - ΞΌΒ² - ΟΒ²)
Ξ²-VAE:
Ξ² > 1: λ κ°ν disentanglement
Ξ² < 1: λ λμ reconstruction
VAE μν€ν μ²¶
νμ€ VAE (MNIST)¶
Encoder:
Input (28Γ28Γ1)
β
Conv2d(1β32, k=3, s=2, p=1) β (14Γ14Γ32)
β ReLU
Conv2d(32β64, k=3, s=2, p=1) β (7Γ7Γ64)
β ReLU
Flatten β (7Γ7Γ64 = 3136)
β
Linear(3136β256)
β ReLU
ββββββββββββββββββ¬βββββββββββββββββ
β Linear(256βz) β Linear(256βz) β
β ΞΌ β log ΟΒ² β
ββββββββββββββββββ΄βββββββββββββββββ
Reparameterization:
z = ΞΌ + Ο β Ξ΅, Ξ΅ ~ N(0, I)
Decoder:
z (latent_dim)
β
Linear(zβ256)
β ReLU
Linear(256β3136)
β ReLU
Reshape β (7Γ7Γ64)
β
ConvT2d(64β32, k=3, s=2, p=1, op=1) β (14Γ14Γ32)
β ReLU
ConvT2d(32β1, k=3, s=2, p=1, op=1) β (28Γ28Γ1)
β Sigmoid
Output (28Γ28Γ1)
νμΌ κ΅¬μ‘°¶
11_VAE/
βββ README.md
βββ numpy/
β βββ vae_numpy.py # NumPy VAE (forwardλ§)
βββ pytorch_lowlevel/
β βββ vae_lowlevel.py # PyTorch Low-Level VAE
βββ paper/
β βββ vae_paper.py # λ
Όλ¬Έ μ¬ν
βββ exercises/
βββ 01_latent_space.md # μ μ¬ κ³΅κ° μκ°ν
βββ 02_interpolation.md # μ μ¬ κ³΅κ° λ³΄κ°
ν΅μ¬ κ°λ ¶
1. Latent Space¶
μ’μ μ μ¬ κ³΅κ°μ νΉμ±:
1. Continuity: κ°κΉμ΄ μ λ€μ λΉμ·ν μΆλ ₯
2. Completeness: λͺ¨λ μ μ΄ μλ―Έμλ μΆλ ₯ μμ±
3. (Disentanglement): κ° μ°¨μμ΄ λ
립μ νΉμ± μ μ΄
VAE vs AE:
- AE: μ μλ² λ© β λΆμ°μμ , λΉ κ³΅κ° μμ
- VAE: λΆν¬ μλ² λ© β μ°μμ , μνλ§ κ°λ₯
2. VAE Variants¶
Ξ²-VAE (Ξ² > 1):
- λ κ°ν KL regularization
- Better disentanglement
- Worse reconstruction
Conditional VAE (CVAE):
- 쑰건 c μΆκ°: q(z|x, c), p(x|z, c)
- μ‘°κ±΄λΆ μμ± κ°λ₯
VQ-VAE:
- μ°μ μ μ¬ κ³΅κ° λμ μ΄μ° μ½λλΆ
- DALL-E, AudioLM λ±μ μ¬μ©
3. νμ΅ μμ μ±¶
KL Annealing:
- μ΄κΈ°: Ξ²=0 (reconstructionμ μ§μ€)
- μ μ§μ μΌλ‘ Ξ²β1 (μ κ·ν μΆκ°)
Free Bits:
- KL μ΅μκ° λ³΄μ₯ (posterior collapse λ°©μ§)
- L_KL = max(KL, Ξ»)
ꡬν λ 벨¶
Level 2: PyTorch Low-Level (pytorch_lowlevel/)¶
- F.conv2d, F.linear μ§μ μ¬μ©
- reparameterization trick ꡬν
- ELBO μμ€ ν¨μ ꡬν
Level 3: Paper Implementation (paper/)¶
- Ξ²-VAE ꡬν
- CVAE (Conditional) ꡬν
- μ μ¬ κ³΅κ° μκ°ν
νμ΅ μ²΄ν¬λ¦¬μ€νΈ¶
- [ ] ELBO μ λ κ³Όμ μ΄ν΄
- [ ] Reparameterization trick μ΄ν΄
- [ ] KL divergence κ³μ°
- [ ] Ξ²μ μν μ΄ν΄
- [ ] μ μ¬ κ³΅κ° μκ°ν
- [ ] Conditional VAE ꡬν
μ°Έκ³ μλ£¶
- Kingma & Welling (2013). "Auto-Encoding Variational Bayes"
- Higgins et al. (2017). "Ξ²-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework"
- 30_Generative_Models_VAE.md