05. CNN ๊ธฐ์ด (Convolutional Neural Networks)
05. CNN ๊ธฐ์ด (Convolutional Neural Networks)¶
ํ์ต ๋ชฉํ¶
- ํฉ์ฑ๊ณฑ ์ฐ์ฐ์ ์๋ฆฌ ์ดํด
- ํ๋ง, ํจ๋ฉ, ์คํธ๋ผ์ด๋ ๊ฐ๋
- PyTorch๋ก CNN ๊ตฌํ
- MNIST/CIFAR-10 ๋ถ๋ฅ
1. ํฉ์ฑ๊ณฑ (Convolution) ์ฐ์ฐ¶
๊ฐ๋ ¶
์ด๋ฏธ์ง์ ์ง์ญ์ ํจํด(์์ง, ํ ์ค์ฒ)์ ๊ฐ์งํฉ๋๋ค.
์
๋ ฅ ์ด๋ฏธ์ง ํํฐ(์ปค๋) ์ถ๋ ฅ
[1 2 3 4] [1 0] [?]
[5 6 7 8] * [0 1] =
[9 0 1 2]
์์¶
์ถ๋ ฅ[i,j] = ฮฃ ฮฃ ์
๋ ฅ[i+m, j+n] ร ํํฐ[m, n]
์ฐจ์ ๊ณ์ฐ¶
์ถ๋ ฅ ํฌ๊ธฐ = (์
๋ ฅ - ์ปค๋ + 2รํจ๋ฉ) / ์คํธ๋ผ์ด๋ + 1
์: ์
๋ ฅ 32ร32, ์ปค๋ 3ร3, ํจ๋ฉ 1, ์คํธ๋ผ์ด๋ 1
= (32 - 3 + 2) / 1 + 1 = 32
2. ์ฃผ์ ๊ฐ๋ ¶
ํจ๋ฉ (Padding)¶
์
๋ ฅ ํ
๋๋ฆฌ์ 0์ ์ถ๊ฐํ์ฌ ์ถ๋ ฅ ํฌ๊ธฐ ์ ์ง
padding='same': ์ถ๋ ฅ = ์
๋ ฅ ํฌ๊ธฐ
padding='valid': ํจ๋ฉ ์์ (์ถ๋ ฅ < ์
๋ ฅ)
์คํธ๋ผ์ด๋ (Stride)¶
ํํฐ ์ด๋ ๊ฐ๊ฒฉ
stride=1: ํ ์นธ์ฉ ์ด๋ (๊ธฐ๋ณธ)
stride=2: ๋ ์นธ์ฉ ์ด๋ โ ์ถ๋ ฅ ํฌ๊ธฐ ์ ๋ฐ
ํ๋ง (Pooling)¶
๊ณต๊ฐ ํฌ๊ธฐ ์ถ์, ๋ถ๋ณ์ฑ ์ฆ๊ฐ
Max Pooling: ์์ญ ๋ด ์ต๋๊ฐ
Avg Pooling: ์์ญ ๋ด ํ๊ท ๊ฐ
3. CNN ๊ตฌ์กฐ¶
๊ธฐ๋ณธ ๊ตฌ์กฐ¶
์
๋ ฅ โ [Conv โ ReLU โ Pool] ร N โ Flatten โ FC โ ์ถ๋ ฅ
LeNet-5 (1998)¶
์
๋ ฅ (32ร32ร1)
โ
Conv1 (5ร5, 6์ฑ๋) โ 28ร28ร6
โ
MaxPool (2ร2) โ 14ร14ร6
โ
Conv2 (5ร5, 16์ฑ๋) โ 10ร10ร16
โ
MaxPool (2ร2) โ 5ร5ร16
โ
Flatten โ 400
โ
FC โ 120 โ 84 โ 10
4. PyTorch Conv2d¶
๊ธฐ๋ณธ ์ฌ์ฉ๋ฒ¶
import torch.nn as nn
# Conv2d(์
๋ ฅ์ฑ๋, ์ถ๋ ฅ์ฑ๋, ์ปค๋ํฌ๊ธฐ, stride, padding)
conv = nn.Conv2d(
in_channels=3, # RGB ์ด๋ฏธ์ง
out_channels=64, # 64๊ฐ ํํฐ
kernel_size=3, # 3ร3 ์ปค๋
stride=1,
padding=1 # same padding
)
# ์
๋ ฅ: (batch, channels, height, width)
x = torch.randn(1, 3, 32, 32)
out = conv(x) # (1, 64, 32, 32)
MaxPool2d¶
pool = nn.MaxPool2d(kernel_size=2, stride=2)
# 32ร32 โ 16ร16
5. MNIST CNN ๊ตฌํ¶
๋ชจ๋ธ ์ ์¶
class MNISTNet(nn.Module):
def __init__(self):
super().__init__()
# Conv ๋ธ๋ก 1
self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
self.pool1 = nn.MaxPool2d(2, 2)
# Conv ๋ธ๋ก 2
self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
self.pool2 = nn.MaxPool2d(2, 2)
# FC ๋ธ๋ก
self.fc1 = nn.Linear(64 * 7 * 7, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
# x: (batch, 1, 28, 28)
x = F.relu(self.conv1(x)) # (batch, 32, 28, 28)
x = self.pool1(x) # (batch, 32, 14, 14)
x = F.relu(self.conv2(x)) # (batch, 64, 14, 14)
x = self.pool2(x) # (batch, 64, 7, 7)
x = x.view(-1, 64 * 7 * 7) # Flatten
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
ํ์ต ์ฝ๋¶
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# ๋ฐ์ดํฐ ๋ก๋
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
train_data = datasets.MNIST('data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
# ๋ชจ๋ธ, ์์ค, ์ตํฐ๋ง์ด์
model = MNISTNet()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# ํ์ต
for epoch in range(5):
for images, labels in train_loader:
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
6. ํน์ง ๋งต ์๊ฐํ¶
def visualize_feature_maps(model, image):
"""์ฒซ ๋ฒ์งธ Conv ์ธต์ ํน์ง ๋งต ์๊ฐํ"""
model.eval()
with torch.no_grad():
# ์ฒซ ๋ฒ์งธ Conv ์ถ๋ ฅ
x = model.conv1(image)
x = F.relu(x)
# ๊ทธ๋ฆฌ๋๋ก ํ์
fig, axes = plt.subplots(4, 8, figsize=(12, 6))
for i, ax in enumerate(axes.flat):
if i < x.shape[1]:
ax.imshow(x[0, i].cpu().numpy(), cmap='viridis')
ax.axis('off')
plt.tight_layout()
plt.savefig('feature_maps.png')
7. NumPy๋ก ํฉ์ฑ๊ณฑ ์ดํด (์ฐธ๊ณ )¶
def conv2d_numpy(image, kernel):
"""NumPy๋ก 2D ํฉ์ฑ๊ณฑ ๊ตฌํ (๊ต์ก์ฉ)"""
h, w = image.shape
kh, kw = kernel.shape
oh, ow = h - kh + 1, w - kw + 1
output = np.zeros((oh, ow))
for i in range(oh):
for j in range(ow):
# ์์ญ ์ถ์ถ
region = image[i:i+kh, j:j+kw]
# ์์๋ณ ๊ณฑ์
ํ ํฉ์ฐ
output[i, j] = np.sum(region * kernel)
return output
# Sobel ์์ง ๊ฒ์ถ ์์
sobel_x = np.array([[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]])
edges = conv2d_numpy(image, sobel_x)
์ฐธ๊ณ : ์ค์ CNN์์๋ PyTorch์ ์ต์ ํ๋ ๊ตฌํ์ ์ฌ์ฉํฉ๋๋ค.
8. ๋ฐฐ์น ์ ๊ทํ์ Dropout¶
CNN์์ ์ฌ์ฉ¶
class CNNWithBN(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
self.bn1 = nn.BatchNorm2d(32) # Conv์ฉ BN
self.pool = nn.MaxPool2d(2, 2)
self.dropout = nn.Dropout2d(0.25) # 2D Dropout
self.fc1 = nn.Linear(32 * 14 * 14, 128)
self.bn_fc = nn.BatchNorm1d(128) # FC์ฉ BN
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = F.relu(x)
x = self.pool(x)
x = self.dropout(x)
x = x.view(-1, 32 * 14 * 14)
x = self.fc1(x)
x = self.bn_fc(x)
x = F.relu(x)
x = self.fc2(x)
return x
9. CIFAR-10 ๋ถ๋ฅ¶
๋ฐ์ดํฐ¶
- 32ร32 RGB ์ด๋ฏธ์ง
- 10๊ฐ ํด๋์ค: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
๋ชจ๋ธ¶
class CIFAR10Net(nn.Module):
def __init__(self):
super().__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2, 2), # 32โ16
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2, 2), # 16โ8
)
self.classifier = nn.Sequential(
nn.Dropout(0.5),
nn.Linear(128 * 8 * 8, 256),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(256, 10),
)
def forward(self, x):
x = self.features(x)
x = x.view(-1, 128 * 8 * 8)
x = self.classifier(x)
return x
10. ์ ๋ฆฌ¶
ํต์ฌ ๊ฐ๋ ¶
- ํฉ์ฑ๊ณฑ: ์ง์ญ ํจํด ์ถ์ถ, ํ๋ผ๋ฏธํฐ ๊ณต์
- ํ๋ง: ๊ณต๊ฐ ์ถ์, ๋ถ๋ณ์ฑ ์ฆ๊ฐ
- ์ฑ๋: ๋ค์ํ ํน์ง ํ์ต
- ๊ณ์ธต์ ํ์ต: ์ ์์ค โ ๊ณ ์์ค ํน์ง
CNN vs MLP¶
| ํญ๋ชฉ | MLP | CNN |
|---|---|---|
| ์ฐ๊ฒฐ | ์์ ์ฐ๊ฒฐ | ์ง์ญ ์ฐ๊ฒฐ |
| ํ๋ผ๋ฏธํฐ | ๋ง์ | ์ ์ (๊ณต์ ) |
| ๊ณต๊ฐ ์ ๋ณด | ๋ฌด์ | ๋ณด์กด |
| ์ด๋ฏธ์ง | ๋นํจ์จ์ | ํจ์จ์ |
๋ค์ ๋จ๊ณ¶
08_CNN_Advanced.md์์ ResNet, VGG ๋ฑ ์ ๋ช ์ํคํ ์ฒ๋ฅผ ํ์ตํฉ๋๋ค.