10. CNN (LeNet)¶

Overview¶

LeNet-5 is the first successful Convolutional Neural Network proposed by Yann LeCun in 1998. It showed excellent performance on handwritten digit recognition (MNIST) and laid the foundation for modern CNNs.

Mathematical Background¶

1. Convolution Operation¶

2D Convolution:
(I * K)[i,j] = Σ_m Σ_n I[i+m, j+n] · K[m, n]

Where:
- I: input image (H × W)
- K: kernel/filter (k_h × k_w)
- *: convolution operation

Output size:
H_out = (H_in + 2P - K) / S + 1
W_out = (W_in + 2P - K) / S + 1

- P: padding
- S: stride
- K: kernel size

2. Pooling Operation¶

Max Pooling:
y[i,j] = max(x[i*s:i*s+k, j*s:j*s+k])

Average Pooling:
y[i,j] = mean(x[i*s:i*s+k, j*s:j*s+k])

Purpose:
1. Reduce spatial resolution (down-sampling)
2. Increase translation invariance
3. Reduce parameters/computation

3. Backpropagation through Convolution¶

Forward:
Y = X * W + b

Backward:

∂L/∂W = X * ∂L/∂Y  (cross-correlation)

∂L/∂X = ∂L/∂Y * rot180(W)  (full convolution)

∂L/∂b = Σ ∂L/∂Y

LeNet-5 Architecture¶

Input: 32×32 grayscale image

Layer 1: Conv (5×5, 6 filters) → 28×28×6
         + Tanh + AvgPool (2×2) → 14×14×6

Layer 2: Conv (5×5, 16 filters) → 10×10×16
         + Tanh + AvgPool (2×2) → 5×5×16

Layer 3: Conv (5×5, 120 filters) → 1×1×120
         + Tanh

Layer 4: FC (120 → 84) + Tanh

Layer 5: FC (84 → 10) (output)

Parameters:
- Conv1: 5×5×1×6 + 6 = 156
- Conv2: 5×5×6×16 + 16 = 2,416
- Conv3: 5×5×16×120 + 120 = 48,120
- FC1: 120×84 + 84 = 10,164
- FC2: 84×10 + 10 = 850
- Total: ~61,706 parameters

File Structure¶

03_CNN_LeNet/
├── README.md                      # This file
├── numpy/
│   ├── conv_numpy.py             # NumPy Convolution implementation
│   ├── pooling_numpy.py          # NumPy Pooling implementation
│   └── lenet_numpy.py            # Complete LeNet NumPy implementation
├── pytorch_lowlevel/
│   └── lenet_lowlevel.py         # Using F.conv2d, not nn.Conv2d
├── paper/
│   └── lenet_paper.py            # Exact paper architecture reproduction
└── exercises/
    ├── 01_visualize_filters.md   # Filter visualization
    └── 02_receptive_field.md     # Receptive field calculation

Core Concepts¶

1. Local Connectivity¶

Fully Connected:
- Every input connects to every output
- Parameters: H_in × W_in × H_out × W_out

Convolution:
- Only local region connections (kernel size)
- Parameters: K × K × C_in × C_out
- Efficient through parameter sharing

Same filter applied across entire image
→ Translation equivariance
→ Detects same features at any location

3. Hierarchical Features¶

Layer 1: Edges, corners (low-level)
Layer 2: Textures, patterns (mid-level)
Layer 3: Object parts (high-level)
Layer 4+: Complete objects (semantic)

Implementation Levels¶

Level 1: NumPy From-Scratch (numpy/)¶

Direct implementation of convolution with loops
im2col optimization
Manual backpropagation implementation

Level 2: PyTorch Low-Level (pytorch_lowlevel/)¶

Use F.conv2d, F.max_pool2d
Don't use nn.Conv2d
Manual parameter management

Level 3: Paper Implementation (paper/)¶

Reproduce original paper architecture
Tanh activation (instead of ReLU)
Average Pooling (instead of Max)

Learning Checklist¶

[ ] Understand convolution formula
[ ] Memorize output size calculation formula
[ ] Understand im2col technique
[ ] Derive conv backward
[ ] Understand max pooling backward
[ ] Memorize LeNet architecture

References¶

LeCun et al. (1998). "Gradient-Based Learning Applied to Document Recognition"
CS231n: Convolutional Neural Networks
../Deep_Learning/08_CNN_Basics.md