10. CNN (LeNet)
10. CNN (LeNet)¶
Previous: Transfer Learning | Next: VGG
Overview¶
LeNet-5 is the first successful Convolutional Neural Network proposed by Yann LeCun in 1998. It showed excellent performance on handwritten digit recognition (MNIST) and laid the foundation for modern CNNs.
Mathematical Background¶
1. Convolution Operation¶
2D Convolution:
(I * K)[i,j] = ฮฃ_m ฮฃ_n I[i+m, j+n] ยท K[m, n]
Where:
- I: input image (H ร W)
- K: kernel/filter (k_h ร k_w)
- *: convolution operation
Output size:
H_out = (H_in + 2P - K) / S + 1
W_out = (W_in + 2P - K) / S + 1
- P: padding
- S: stride
- K: kernel size
2. Pooling Operation¶
Max Pooling:
y[i,j] = max(x[i*s:i*s+k, j*s:j*s+k])
Average Pooling:
y[i,j] = mean(x[i*s:i*s+k, j*s:j*s+k])
Purpose:
1. Reduce spatial resolution (down-sampling)
2. Increase translation invariance
3. Reduce parameters/computation
3. Backpropagation through Convolution¶
Forward:
Y = X * W + b
Backward:
โL/โW = X * โL/โY (cross-correlation)
โL/โX = โL/โY * rot180(W) (full convolution)
โL/โb = ฮฃ โL/โY
LeNet-5 Architecture¶
Input: 32ร32 grayscale image
Layer 1: Conv (5ร5, 6 filters) โ 28ร28ร6
+ Tanh + AvgPool (2ร2) โ 14ร14ร6
Layer 2: Conv (5ร5, 16 filters) โ 10ร10ร16
+ Tanh + AvgPool (2ร2) โ 5ร5ร16
Layer 3: Conv (5ร5, 120 filters) โ 1ร1ร120
+ Tanh
Layer 4: FC (120 โ 84) + Tanh
Layer 5: FC (84 โ 10) (output)
Parameters:
- Conv1: 5ร5ร1ร6 + 6 = 156
- Conv2: 5ร5ร6ร16 + 16 = 2,416
- Conv3: 5ร5ร16ร120 + 120 = 48,120
- FC1: 120ร84 + 84 = 10,164
- FC2: 84ร10 + 10 = 850
- Total: ~61,706 parameters
File Structure¶
03_CNN_LeNet/
โโโ README.md # This file
โโโ numpy/
โ โโโ conv_numpy.py # NumPy Convolution implementation
โ โโโ pooling_numpy.py # NumPy Pooling implementation
โ โโโ lenet_numpy.py # Complete LeNet NumPy implementation
โโโ pytorch_lowlevel/
โ โโโ lenet_lowlevel.py # Using F.conv2d, not nn.Conv2d
โโโ paper/
โ โโโ lenet_paper.py # Exact paper architecture reproduction
โโโ exercises/
โโโ 01_visualize_filters.md # Filter visualization
โโโ 02_receptive_field.md # Receptive field calculation
Core Concepts¶
1. Local Connectivity¶
Fully Connected:
- Every input connects to every output
- Parameters: H_in ร W_in ร H_out ร W_out
Convolution:
- Only local region connections (kernel size)
- Parameters: K ร K ร C_in ร C_out
- Efficient through parameter sharing
2. Parameter Sharing¶
Same filter applied across entire image
โ Translation equivariance
โ Detects same features at any location
3. Hierarchical Features¶
Layer 1: Edges, corners (low-level)
Layer 2: Textures, patterns (mid-level)
Layer 3: Object parts (high-level)
Layer 4+: Complete objects (semantic)
Implementation Levels¶
Level 1: NumPy From-Scratch (numpy/)¶
- Direct implementation of convolution with loops
- im2col optimization
- Manual backpropagation implementation
Level 2: PyTorch Low-Level (pytorch_lowlevel/)¶
- Use F.conv2d, F.max_pool2d
- Don't use nn.Conv2d
- Manual parameter management
Level 3: Paper Implementation (paper/)¶
- Reproduce original paper architecture
- Tanh activation (instead of ReLU)
- Average Pooling (instead of Max)
Learning Checklist¶
- [ ] Understand convolution formula
- [ ] Memorize output size calculation formula
- [ ] Understand im2col technique
- [ ] Derive conv backward
- [ ] Understand max pooling backward
- [ ] Memorize LeNet architecture
References¶
- LeCun et al. (1998). "Gradient-Based Learning Applied to Document Recognition"
- CS231n: Convolutional Neural Networks
- ../Deep_Learning/08_CNN_Basics.md