03. CNN (LeNet)
03. CNN (LeNet)¶
κ°μ¶
LeNet-5λ Yann LeCunμ΄ 1998λ μ μ μν μ΅μ΄μ μ±κ³΅μ μΈ Convolutional Neural Networkμ λλ€. μκΈμ¨ μ«μ μΈμ(MNIST)μμ λ°μ΄λ μ±λ₯μ 보μ¬μ£ΌμμΌλ©°, νλ CNNμ κΈ°μ΄κ° λμμ΅λλ€.
μνμ λ°°κ²½¶
1. Convolution μ°μ°¶
2D Convolution:
(I * K)[i,j] = Ξ£_m Ξ£_n I[i+m, j+n] Β· K[m, n]
μ¬κΈ°μ:
- I: μ
λ ₯ μ΄λ―Έμ§ (H Γ W)
- K: 컀λ/νν° (k_h Γ k_w)
- *: convolution μ°μ°
μΆλ ₯ ν¬κΈ°:
H_out = (H_in + 2P - K) / S + 1
W_out = (W_in + 2P - K) / S + 1
- P: padding
- S: stride
- K: kernel size
2. Pooling μ°μ°¶
Max Pooling:
y[i,j] = max(x[i*s:i*s+k, j*s:j*s+k])
Average Pooling:
y[i,j] = mean(x[i*s:i*s+k, j*s:j*s+k])
λͺ©μ :
1. κ³΅κ° ν΄μλ κ°μ (down-sampling)
2. Translation invariance μ¦κ°
3. νλΌλ―Έν°/κ³μ°λ κ°μ
3. Backpropagation through Convolution¶
Forward:
Y = X * W + b
Backward:
βL/βW = X * βL/βY (cross-correlation)
βL/βX = βL/βY * rot180(W) (full convolution)
βL/βb = Ξ£ βL/βY
LeNet-5 μν€ν μ²¶
μ
λ ₯: 32Γ32 νλ°± μ΄λ―Έμ§
Layer 1: Conv (5Γ5, 6 filters) β 28Γ28Γ6
+ Tanh + AvgPool (2Γ2) β 14Γ14Γ6
Layer 2: Conv (5Γ5, 16 filters) β 10Γ10Γ16
+ Tanh + AvgPool (2Γ2) β 5Γ5Γ16
Layer 3: Conv (5Γ5, 120 filters) β 1Γ1Γ120
+ Tanh
Layer 4: FC (120 β 84) + Tanh
Layer 5: FC (84 β 10) (μΆλ ₯)
νλΌλ―Έν°:
- Conv1: 5Γ5Γ1Γ6 + 6 = 156
- Conv2: 5Γ5Γ6Γ16 + 16 = 2,416
- Conv3: 5Γ5Γ16Γ120 + 120 = 48,120
- FC1: 120Γ84 + 84 = 10,164
- FC2: 84Γ10 + 10 = 850
- μ΄: ~61,706 νλΌλ―Έν°
νμΌ κ΅¬μ‘°¶
03_CNN_LeNet/
βββ README.md # μ΄ νμΌ
βββ numpy/
β βββ conv_numpy.py # NumPyλ‘ Convolution ꡬν
β βββ pooling_numpy.py # NumPyλ‘ Pooling ꡬν
β βββ lenet_numpy.py # μ 체 LeNet NumPy ꡬν
βββ pytorch_lowlevel/
β βββ lenet_lowlevel.py # F.conv2d μ¬μ©, nn.Conv2d λ―Έμ¬μ©
βββ paper/
β βββ lenet_paper.py # λ
Όλ¬Έ μν€ν
μ² μ ν μ¬ν
βββ exercises/
βββ 01_visualize_filters.md # νν° μκ°ν
βββ 02_receptive_field.md # μμ© μμ κ³μ°
ν΅μ¬ κ°λ ¶
1. Local Connectivity¶
Fully Connected:
- λͺ¨λ μ
λ ₯μ΄ λͺ¨λ μΆλ ₯μ μ°κ²°
- νλΌλ―Έν°: H_in Γ W_in Γ H_out Γ W_out
Convolution:
- λ‘컬 μμλ§ μ°κ²° (컀λ ν¬κΈ°)
- νλΌλ―Έν°: K Γ K Γ C_in Γ C_out
- νλΌλ―Έν° 곡μ λ‘ ν¨μ¨μ
2. Parameter Sharing¶
κ°μ νν°κ° μ΄λ―Έμ§ μ 체μ μ μ©
β Translation equivariance
β μ΄λ€ μμΉμμλ κ°μ νΉμ§ κ°μ§
3. Hierarchical Features¶
Layer 1: μ£μ§, μ½λ (μ μμ€)
Layer 2: ν
μ€μ², ν¨ν΄ (μ€μμ€)
Layer 3: λΆλΆ κ°μ²΄ (κ³ μμ€)
Layer 4+: μ 체 κ°μ²΄ (μλ―Έλ‘ μ )
ꡬν λ 벨¶
Level 1: NumPy From-Scratch (numpy/)¶
- Convolutionμ 루νλ‘ μ§μ ꡬν
- im2col μ΅μ ν
- Backpropagation μλ ꡬν
Level 2: PyTorch Low-Level (pytorch_lowlevel/)¶
- F.conv2d, F.max_pool2d μ¬μ©
- nn.Conv2d λ―Έμ¬μ©
- νλΌλ―Έν° μλ κ΄λ¦¬
Level 3: Paper Implementation (paper/)¶
- μλ³Έ λ Όλ¬Έ μν€ν μ² μ¬ν
- Tanh νμ±ν (ReLU λμ )
- Average Pooling (Max λμ )
νμ΅ μ²΄ν¬λ¦¬μ€νΈ¶
- [ ] Convolution μμ μ΄ν΄
- [ ] μΆλ ₯ ν¬κΈ° κ³μ° 곡μ μκΈ°
- [ ] im2col κΈ°λ² μ΄ν΄
- [ ] Conv backward μ λ
- [ ] Max pooling backward μ΄ν΄
- [ ] LeNet μν€ν μ² μκΈ°
μ°Έκ³ μλ£¶
- LeCun et al. (1998). "Gradient-Based Learning Applied to Document Recognition"
- CS231n: Convolutional Neural Networks
- 07_CNN_Basics.md