10. CNN (LeNet)

10. CNN (LeNet)

Previous: Transfer Learning | Next: VGG


Overview

LeNet-5 is the first successful Convolutional Neural Network proposed by Yann LeCun in 1998. It showed excellent performance on handwritten digit recognition (MNIST) and laid the foundation for modern CNNs.


Mathematical Background

1. Convolution Operation

2D Convolution:
(I * K)[i,j] = ฮฃ_m ฮฃ_n I[i+m, j+n] ยท K[m, n]

Where:
- I: input image (H ร— W)
- K: kernel/filter (k_h ร— k_w)
- *: convolution operation

Output size:
H_out = (H_in + 2P - K) / S + 1
W_out = (W_in + 2P - K) / S + 1

- P: padding
- S: stride
- K: kernel size

2. Pooling Operation

Max Pooling:
y[i,j] = max(x[i*s:i*s+k, j*s:j*s+k])

Average Pooling:
y[i,j] = mean(x[i*s:i*s+k, j*s:j*s+k])

Purpose:
1. Reduce spatial resolution (down-sampling)
2. Increase translation invariance
3. Reduce parameters/computation

3. Backpropagation through Convolution

Forward:
Y = X * W + b

Backward:

โˆ‚L/โˆ‚W = X * โˆ‚L/โˆ‚Y  (cross-correlation)

โˆ‚L/โˆ‚X = โˆ‚L/โˆ‚Y * rot180(W)  (full convolution)

โˆ‚L/โˆ‚b = ฮฃ โˆ‚L/โˆ‚Y

LeNet-5 Architecture

Input: 32ร—32 grayscale image

Layer 1: Conv (5ร—5, 6 filters) โ†’ 28ร—28ร—6
         + Tanh + AvgPool (2ร—2) โ†’ 14ร—14ร—6

Layer 2: Conv (5ร—5, 16 filters) โ†’ 10ร—10ร—16
         + Tanh + AvgPool (2ร—2) โ†’ 5ร—5ร—16

Layer 3: Conv (5ร—5, 120 filters) โ†’ 1ร—1ร—120
         + Tanh

Layer 4: FC (120 โ†’ 84) + Tanh

Layer 5: FC (84 โ†’ 10) (output)

Parameters:
- Conv1: 5ร—5ร—1ร—6 + 6 = 156
- Conv2: 5ร—5ร—6ร—16 + 16 = 2,416
- Conv3: 5ร—5ร—16ร—120 + 120 = 48,120
- FC1: 120ร—84 + 84 = 10,164
- FC2: 84ร—10 + 10 = 850
- Total: ~61,706 parameters

File Structure

03_CNN_LeNet/
โ”œโ”€โ”€ README.md                      # This file
โ”œโ”€โ”€ numpy/
โ”‚   โ”œโ”€โ”€ conv_numpy.py             # NumPy Convolution implementation
โ”‚   โ”œโ”€โ”€ pooling_numpy.py          # NumPy Pooling implementation
โ”‚   โ””โ”€โ”€ lenet_numpy.py            # Complete LeNet NumPy implementation
โ”œโ”€โ”€ pytorch_lowlevel/
โ”‚   โ””โ”€โ”€ lenet_lowlevel.py         # Using F.conv2d, not nn.Conv2d
โ”œโ”€โ”€ paper/
โ”‚   โ””โ”€โ”€ lenet_paper.py            # Exact paper architecture reproduction
โ””โ”€โ”€ exercises/
    โ”œโ”€โ”€ 01_visualize_filters.md   # Filter visualization
    โ””โ”€โ”€ 02_receptive_field.md     # Receptive field calculation

Core Concepts

1. Local Connectivity

Fully Connected:
- Every input connects to every output
- Parameters: H_in ร— W_in ร— H_out ร— W_out

Convolution:
- Only local region connections (kernel size)
- Parameters: K ร— K ร— C_in ร— C_out
- Efficient through parameter sharing

2. Parameter Sharing

Same filter applied across entire image
โ†’ Translation equivariance
โ†’ Detects same features at any location

3. Hierarchical Features

Layer 1: Edges, corners (low-level)
Layer 2: Textures, patterns (mid-level)
Layer 3: Object parts (high-level)
Layer 4+: Complete objects (semantic)

Implementation Levels

Level 1: NumPy From-Scratch (numpy/)

  • Direct implementation of convolution with loops
  • im2col optimization
  • Manual backpropagation implementation

Level 2: PyTorch Low-Level (pytorch_lowlevel/)

  • Use F.conv2d, F.max_pool2d
  • Don't use nn.Conv2d
  • Manual parameter management

Level 3: Paper Implementation (paper/)

  • Reproduce original paper architecture
  • Tanh activation (instead of ReLU)
  • Average Pooling (instead of Max)

Learning Checklist

  • [ ] Understand convolution formula
  • [ ] Memorize output size calculation formula
  • [ ] Understand im2col technique
  • [ ] Derive conv backward
  • [ ] Understand max pooling backward
  • [ ] Memorize LeNet architecture

References

to navigate between lessons