08. Advanced CNN - Famous Architectures¶

Previous: CNN Basics | Next: Transfer Learning

Learning Objectives¶

Understand VGG, ResNet, and EfficientNet architectures
Learn Skip Connection and Residual Learning
Understand training problems of deep networks and solutions
Implement with PyTorch

1. VGG (2014)¶

Core Ideas¶

Use only small filters (3×3)
Improve performance by increasing depth
Simple and consistent structure

Architecture (VGG16)¶

Input 224×224×3
  ↓
Conv 3×3, 64 ×2 → MaxPool → 112×112×64
  ↓
Conv 3×3, 128 ×2 → MaxPool → 56×56×128
  ↓
Conv 3×3, 256 ×3 → MaxPool → 28×28×256
  ↓
Conv 3×3, 512 ×3 → MaxPool → 14×14×512
  ↓
Conv 3×3, 512 ×3 → MaxPool → 7×7×512
  ↓
FC 4096 → FC 4096 → FC 1000

PyTorch Implementation¶

def make_vgg_block(in_ch, out_ch, num_convs):
    layers = []
    for i in range(num_convs):
        layers.append(nn.Conv2d(
            in_ch if i == 0 else out_ch,
            out_ch, 3, padding=1
        ))
        layers.append(nn.ReLU())
    layers.append(nn.MaxPool2d(2, 2))
    return nn.Sequential(*layers)

class VGG16(nn.Module):
    def __init__(self, num_classes=1000):
        super().__init__()
        self.features = nn.Sequential(
            make_vgg_block(3, 64, 2),
            make_vgg_block(64, 128, 2),
            make_vgg_block(128, 256, 3),
            make_vgg_block(256, 512, 3),
            make_vgg_block(512, 512, 3),
        )
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

2. ResNet (2015)¶

Problem: Vanishing Gradients¶

Gradients vanish as network gets deeper
Simply stacking layers degrades performance

Solution: Residual Connection¶

        ┌─────────────────┐
        │                 │
x ──────┼───► Conv ──► Conv ──►(+)──► ReLU ──► Output
        │                 ↑
        └────────(identity)┘

Output = F(x) + x   (Residual Learning)

Key Insight¶

Learning identity function becomes easier
Gradients flow directly through skip connections
Can train networks with 1000+ layers

PyTorch Implementation¶

class BasicBlock(nn.Module):
    """ResNet basic block"""
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity  # Skip connection!
        out = F.relu(out)
        return out

Bottleneck Block (ResNet-50+)¶

class Bottleneck(nn.Module):
    """1×1 → 3×3 → 1×1 structure"""
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, 1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3,
                               stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * 4, 1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * 4)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = F.relu(out)
        return out

3. ResNet Variants¶

Pre-activation ResNet¶

Original: x → Conv → BN → ReLU → Conv → BN → (+) → ReLU
Pre-act: x → BN → ReLU → Conv → BN → ReLU → Conv → (+)

ResNeXt¶

# Using grouped convolution
self.conv2 = nn.Conv2d(out_channels, out_channels, 3,
                       groups=32, padding=1)

SE-ResNet (Squeeze-and-Excitation)¶

class SEBlock(nn.Module):
    def __init__(self, channels, reduction=16):
        super().__init__()
        self.squeeze = nn.AdaptiveAvgPool2d(1)
        self.excitation = nn.Sequential(
            nn.Linear(channels, channels // reduction),
            nn.ReLU(),
            nn.Linear(channels // reduction, channels),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.squeeze(x).view(b, c)
        y = self.excitation(y).view(b, c, 1, 1)
        return x * y  # Channel recalibration

4. EfficientNet (2019)¶

Core Ideas¶

Balanced scaling of depth, width, and resolution
Compound Scaling

depth: α^φ
width: β^φ
resolution: γ^φ

α × β² × γ² ≈ 2 (computation constraint)

MBConv Block¶

class MBConv(nn.Module):
    """Mobile Inverted Bottleneck"""
    def __init__(self, in_ch, out_ch, expand_ratio, stride, se_ratio=0.25):
        super().__init__()
        hidden = in_ch * expand_ratio

        self.expand = nn.Sequential(
            nn.Conv2d(in_ch, hidden, 1, bias=False),
            nn.BatchNorm2d(hidden),
            nn.SiLU()
        ) if expand_ratio != 1 else nn.Identity()

        self.depthwise = nn.Sequential(
            nn.Conv2d(hidden, hidden, 3, stride, 1, groups=hidden, bias=False),
            nn.BatchNorm2d(hidden),
            nn.SiLU()
        )

        self.se = SEBlock(hidden, int(in_ch * se_ratio))

        self.project = nn.Sequential(
            nn.Conv2d(hidden, out_ch, 1, bias=False),
            nn.BatchNorm2d(out_ch)
        )

        self.use_skip = stride == 1 and in_ch == out_ch

    def forward(self, x):
        out = self.expand(x)
        out = self.depthwise(out)
        out = self.se(out)
        out = self.project(out)
        if self.use_skip:
            out = out + x
        return out

5. Architecture Comparison¶

Model	Parameters	Top-1 Acc	Features
VGG16	138M	71.5%	Simple, memory-intensive
ResNet-50	26M	76.0%	Skip Connection
ResNet-152	60M	78.3%	Deeper version
EfficientNet-B0	5.3M	77.1%	Efficient
EfficientNet-B7	66M	84.3%	Best performance

6. torchvision Pretrained Models¶

import torchvision.models as models

# Load pretrained models
resnet50 = models.resnet50(weights='IMAGENET1K_V2')
efficientnet = models.efficientnet_b0(weights='IMAGENET1K_V1')
vgg16 = models.vgg16(weights='IMAGENET1K_V1')

# Feature extraction
resnet50.eval()
for param in resnet50.parameters():
    param.requires_grad = False

# Replace last layer (transfer learning)
resnet50.fc = nn.Linear(2048, 10)  # 10 classes

7. Model Selection Guide¶

Recommendations by Use Case¶

Situation	Recommended Model
Fast inference needed	MobileNet, EfficientNet-B0
High accuracy needed	EfficientNet-B4~B7
Educational/understanding	VGG, ResNet-18
Memory constraints	MobileNet, ShuffleNet

Practical Tips¶

# Check model size
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

# Calculate FLOPs (thop package)
from thop import profile
flops, params = profile(model, inputs=(torch.randn(1, 3, 224, 224),))

Summary¶

Core Concepts¶

VGG: Repeating small filters, deep networks
ResNet: Solve vanishing gradients with Skip Connections
EfficientNet: Efficient scaling

Evolution¶

LeNet (1998)
  ↓
AlexNet (2012) - GPU usage
  ↓
VGG (2014) - Deeper
  ↓
GoogLeNet (2014) - Inception module
  ↓
ResNet (2015) - Skip Connection
  ↓
EfficientNet (2019) - Compound Scaling
  ↓
Vision Transformer (2020) - Attention

Next Steps¶

In 09_Transfer_Learning.md, we'll learn transfer learning using pretrained models.