05. CNN ๊ธฐ์ดˆ (Convolutional Neural Networks)

05. CNN ๊ธฐ์ดˆ (Convolutional Neural Networks)

ํ•™์Šต ๋ชฉํ‘œ

  • ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์˜ ์›๋ฆฌ ์ดํ•ด
  • ํ’€๋ง, ํŒจ๋”ฉ, ์ŠคํŠธ๋ผ์ด๋“œ ๊ฐœ๋…
  • PyTorch๋กœ CNN ๊ตฌํ˜„
  • MNIST/CIFAR-10 ๋ถ„๋ฅ˜

1. ํ•ฉ์„ฑ๊ณฑ (Convolution) ์—ฐ์‚ฐ

๊ฐœ๋…

์ด๋ฏธ์ง€์˜ ์ง€์—ญ์  ํŒจํ„ด(์—์ง€, ํ…์Šค์ฒ˜)์„ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค.

์ž…๋ ฅ ์ด๋ฏธ์ง€     ํ•„ํ„ฐ(์ปค๋„)      ์ถœ๋ ฅ
[1 2 3 4]      [1 0]          [?]
[5 6 7 8]  *   [0 1]   =
[9 0 1 2]

์ˆ˜์‹

์ถœ๋ ฅ[i,j] = ฮฃ ฮฃ ์ž…๋ ฅ[i+m, j+n] ร— ํ•„ํ„ฐ[m, n]

์ฐจ์› ๊ณ„์‚ฐ

์ถœ๋ ฅ ํฌ๊ธฐ = (์ž…๋ ฅ - ์ปค๋„ + 2ร—ํŒจ๋”ฉ) / ์ŠคํŠธ๋ผ์ด๋“œ + 1

์˜ˆ: ์ž…๋ ฅ 32ร—32, ์ปค๋„ 3ร—3, ํŒจ๋”ฉ 1, ์ŠคํŠธ๋ผ์ด๋“œ 1
    = (32 - 3 + 2) / 1 + 1 = 32

2. ์ฃผ์š” ๊ฐœ๋…

ํŒจ๋”ฉ (Padding)

์ž…๋ ฅ ํ…Œ๋‘๋ฆฌ์— 0์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ถœ๋ ฅ ํฌ๊ธฐ ์œ ์ง€

padding='same': ์ถœ๋ ฅ = ์ž…๋ ฅ ํฌ๊ธฐ
padding='valid': ํŒจ๋”ฉ ์—†์Œ (์ถœ๋ ฅ < ์ž…๋ ฅ)

์ŠคํŠธ๋ผ์ด๋“œ (Stride)

ํ•„ํ„ฐ ์ด๋™ ๊ฐ„๊ฒฉ

stride=1: ํ•œ ์นธ์”ฉ ์ด๋™ (๊ธฐ๋ณธ)
stride=2: ๋‘ ์นธ์”ฉ ์ด๋™ โ†’ ์ถœ๋ ฅ ํฌ๊ธฐ ์ ˆ๋ฐ˜

ํ’€๋ง (Pooling)

๊ณต๊ฐ„ ํฌ๊ธฐ ์ถ•์†Œ, ๋ถˆ๋ณ€์„ฑ ์ฆ๊ฐ€

Max Pooling: ์˜์—ญ ๋‚ด ์ตœ๋Œ€๊ฐ’
Avg Pooling: ์˜์—ญ ๋‚ด ํ‰๊ท ๊ฐ’

3. CNN ๊ตฌ์กฐ

๊ธฐ๋ณธ ๊ตฌ์กฐ

์ž…๋ ฅ โ†’ [Conv โ†’ ReLU โ†’ Pool] ร— N โ†’ Flatten โ†’ FC โ†’ ์ถœ๋ ฅ

LeNet-5 (1998)

์ž…๋ ฅ (32ร—32ร—1)
  โ†“
Conv1 (5ร—5, 6์ฑ„๋„) โ†’ 28ร—28ร—6
  โ†“
MaxPool (2ร—2) โ†’ 14ร—14ร—6
  โ†“
Conv2 (5ร—5, 16์ฑ„๋„) โ†’ 10ร—10ร—16
  โ†“
MaxPool (2ร—2) โ†’ 5ร—5ร—16
  โ†“
Flatten โ†’ 400
  โ†“
FC โ†’ 120 โ†’ 84 โ†’ 10

4. PyTorch Conv2d

๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•

import torch.nn as nn

# Conv2d(์ž…๋ ฅ์ฑ„๋„, ์ถœ๋ ฅ์ฑ„๋„, ์ปค๋„ํฌ๊ธฐ, stride, padding)
conv = nn.Conv2d(
    in_channels=3,      # RGB ์ด๋ฏธ์ง€
    out_channels=64,    # 64๊ฐœ ํ•„ํ„ฐ
    kernel_size=3,      # 3ร—3 ์ปค๋„
    stride=1,
    padding=1           # same padding
)

# ์ž…๋ ฅ: (batch, channels, height, width)
x = torch.randn(1, 3, 32, 32)
out = conv(x)  # (1, 64, 32, 32)

MaxPool2d

pool = nn.MaxPool2d(kernel_size=2, stride=2)
# 32ร—32 โ†’ 16ร—16

5. MNIST CNN ๊ตฌํ˜„

๋ชจ๋ธ ์ •์˜

class MNISTNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Conv ๋ธ”๋ก 1
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.pool1 = nn.MaxPool2d(2, 2)

        # Conv ๋ธ”๋ก 2
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.pool2 = nn.MaxPool2d(2, 2)

        # FC ๋ธ”๋ก
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        # x: (batch, 1, 28, 28)
        x = F.relu(self.conv1(x))  # (batch, 32, 28, 28)
        x = self.pool1(x)          # (batch, 32, 14, 14)

        x = F.relu(self.conv2(x))  # (batch, 64, 14, 14)
        x = self.pool2(x)          # (batch, 64, 7, 7)

        x = x.view(-1, 64 * 7 * 7) # Flatten
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

ํ•™์Šต ์ฝ”๋“œ

import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# ๋ฐ์ดํ„ฐ ๋กœ๋“œ
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_data = datasets.MNIST('data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)

# ๋ชจ๋ธ, ์†์‹ค, ์˜ตํ‹ฐ๋งˆ์ด์ €
model = MNISTNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# ํ•™์Šต
for epoch in range(5):
    for images, labels in train_loader:
        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

6. ํŠน์ง• ๋งต ์‹œ๊ฐํ™”

def visualize_feature_maps(model, image):
    """์ฒซ ๋ฒˆ์งธ Conv ์ธต์˜ ํŠน์ง• ๋งต ์‹œ๊ฐํ™”"""
    model.eval()
    with torch.no_grad():
        # ์ฒซ ๋ฒˆ์งธ Conv ์ถœ๋ ฅ
        x = model.conv1(image)
        x = F.relu(x)

    # ๊ทธ๋ฆฌ๋“œ๋กœ ํ‘œ์‹œ
    fig, axes = plt.subplots(4, 8, figsize=(12, 6))
    for i, ax in enumerate(axes.flat):
        if i < x.shape[1]:
            ax.imshow(x[0, i].cpu().numpy(), cmap='viridis')
        ax.axis('off')
    plt.tight_layout()
    plt.savefig('feature_maps.png')

7. NumPy๋กœ ํ•ฉ์„ฑ๊ณฑ ์ดํ•ด (์ฐธ๊ณ )

def conv2d_numpy(image, kernel):
    """NumPy๋กœ 2D ํ•ฉ์„ฑ๊ณฑ ๊ตฌํ˜„ (๊ต์œก์šฉ)"""
    h, w = image.shape
    kh, kw = kernel.shape
    oh, ow = h - kh + 1, w - kw + 1

    output = np.zeros((oh, ow))

    for i in range(oh):
        for j in range(ow):
            # ์˜์—ญ ์ถ”์ถœ
            region = image[i:i+kh, j:j+kw]
            # ์š”์†Œ๋ณ„ ๊ณฑ์…ˆ ํ›„ ํ•ฉ์‚ฐ
            output[i, j] = np.sum(region * kernel)

    return output

# Sobel ์—์ง€ ๊ฒ€์ถœ ์˜ˆ์‹œ
sobel_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])

edges = conv2d_numpy(image, sobel_x)

์ฐธ๊ณ : ์‹ค์ œ CNN์—์„œ๋Š” PyTorch์˜ ์ตœ์ ํ™”๋œ ๊ตฌํ˜„์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.


8. ๋ฐฐ์น˜ ์ •๊ทœํ™”์™€ Dropout

CNN์—์„œ ์‚ฌ์šฉ

class CNNWithBN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)  # Conv์šฉ BN
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout2d(0.25)  # 2D Dropout

        self.fc1 = nn.Linear(32 * 14 * 14, 128)
        self.bn_fc = nn.BatchNorm1d(128)  # FC์šฉ BN
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.pool(x)
        x = self.dropout(x)

        x = x.view(-1, 32 * 14 * 14)
        x = self.fc1(x)
        x = self.bn_fc(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

9. CIFAR-10 ๋ถ„๋ฅ˜

๋ฐ์ดํ„ฐ

  • 32ร—32 RGB ์ด๋ฏธ์ง€
  • 10๊ฐœ ํด๋ž˜์Šค: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck

๋ชจ๋ธ

class CIFAR10Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 32โ†’16

            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # 16โ†’8
        )

        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(128 * 8 * 8, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, 10),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 128 * 8 * 8)
        x = self.classifier(x)
        return x

10. ์ •๋ฆฌ

ํ•ต์‹ฌ ๊ฐœ๋…

  1. ํ•ฉ์„ฑ๊ณฑ: ์ง€์—ญ ํŒจํ„ด ์ถ”์ถœ, ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ 
  2. ํ’€๋ง: ๊ณต๊ฐ„ ์ถ•์†Œ, ๋ถˆ๋ณ€์„ฑ ์ฆ๊ฐ€
  3. ์ฑ„๋„: ๋‹ค์–‘ํ•œ ํŠน์ง• ํ•™์Šต
  4. ๊ณ„์ธต์  ํ•™์Šต: ์ €์ˆ˜์ค€ โ†’ ๊ณ ์ˆ˜์ค€ ํŠน์ง•

CNN vs MLP

ํ•ญ๋ชฉ MLP CNN
์—ฐ๊ฒฐ ์™„์ „ ์—ฐ๊ฒฐ ์ง€์—ญ ์—ฐ๊ฒฐ
ํŒŒ๋ผ๋ฏธํ„ฐ ๋งŽ์Œ ์ ์Œ (๊ณต์œ )
๊ณต๊ฐ„ ์ •๋ณด ๋ฌด์‹œ ๋ณด์กด
์ด๋ฏธ์ง€ ๋น„ํšจ์œจ์  ํšจ์œจ์ 

๋‹ค์Œ ๋‹จ๊ณ„

08_CNN_Advanced.md์—์„œ ResNet, VGG ๋“ฑ ์œ ๋ช… ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

to navigate between lessons