04. VGG

04. VGG

๊ฐœ์š”

VGGNet์€ 2014๋…„ ILSVRC์—์„œ 2์œ„๋ฅผ ์ฐจ์ง€ํ•œ ๋ชจ๋ธ๋กœ, Karen Simonyan๊ณผ Andrew Zisserman์ด ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค. "Very Deep Convolutional Networks for Large-Scale Image Recognition" ๋…ผ๋ฌธ์—์„œ 3x3 ์ž‘์€ ํ•„ํ„ฐ๋ฅผ ๊นŠ๊ฒŒ ์Œ“๋Š” ๊ฒƒ์ด ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.


์ˆ˜ํ•™์  ๋ฐฐ๊ฒฝ

1. 3x3 ํ•„ํ„ฐ ์Šคํƒ์˜ ํšจ๊ณผ

์™œ 3x3 ํ•„ํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ ์Œ“๋Š”๊ฐ€?

2๊ฐœ์˜ 3x3 conv โ‰ˆ 1๊ฐœ์˜ 5x5 conv (๊ฐ™์€ receptive field)
3๊ฐœ์˜ 3x3 conv โ‰ˆ 1๊ฐœ์˜ 7x7 conv

์žฅ์ :
1. ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ ๊ฐ์†Œ:
   - 7x7: 49Cยฒ ํŒŒ๋ผ๋ฏธํ„ฐ
   - 3x3 ร— 3: 27Cยฒ ํŒŒ๋ผ๋ฏธํ„ฐ (45% ๊ฐ์†Œ)

2. ๋น„์„ ํ˜•์„ฑ ์ฆ๊ฐ€:
   - 7x7: 1๊ฐœ์˜ ReLU
   - 3x3 ร— 3: 3๊ฐœ์˜ ReLU โ†’ ๋” ๋ณต์žกํ•œ ํ•จ์ˆ˜ ํ•™์Šต ๊ฐ€๋Šฅ

2. Receptive Field ๊ณ„์‚ฐ

๋ ˆ์ด์–ด๊ฐ€ ์Œ“์ผ์ˆ˜๋ก receptive field ์ฆ๊ฐ€:

RF = (RF_prev - 1) ร— stride + kernel_size

์˜ˆ์‹œ (stride=1, kernel=3):
- Layer 1: RF = 3
- Layer 2: RF = 5
- Layer 3: RF = 7
- Layer 4: RF = 9
...

MaxPool (kernel=2, stride=2) ํ›„:
- RF๊ฐ€ 2๋ฐฐ๋กœ ํ™•์žฅ

3. Feature Map ํฌ๊ธฐ ๋ณ€ํ™”

Conv (stride=1, padding=1, kernel=3):
  H_out = H_in  (ํฌ๊ธฐ ์œ ์ง€)

MaxPool (kernel=2, stride=2):
  H_out = H_in / 2  (ํฌ๊ธฐ ์ ˆ๋ฐ˜)

224 โ†’ [Convร—2] โ†’ 224 โ†’ Pool โ†’ 112 โ†’ [Convร—2] โ†’ 112 โ†’ Pool โ†’ 56 โ†’ ...

VGG ์•„ํ‚คํ…์ฒ˜

VGG ๋ณ€ํ˜• ๋น„๊ต

๊ตฌ์„ฑ VGG11 VGG13 VGG16 VGG19
Conv Layers 8 10 13 16
FC Layers 3 3 3 3
Total Layers 11 13 16 19
Parameters 133M 133M 138M 144M

VGG16 ์ƒ์„ธ ๊ตฌ์กฐ

์ž…๋ ฅ: 224ร—224ร—3 RGB ์ด๋ฏธ์ง€

Block 1: [Conv3-64] ร— 2 + MaxPool
  (224ร—224ร—3) โ†’ (224ร—224ร—64) โ†’ (112ร—112ร—64)

Block 2: [Conv3-128] ร— 2 + MaxPool
  (112ร—112ร—64) โ†’ (112ร—112ร—128) โ†’ (56ร—56ร—128)

Block 3: [Conv3-256] ร— 3 + MaxPool
  (56ร—56ร—128) โ†’ (56ร—56ร—256) โ†’ (28ร—28ร—256)

Block 4: [Conv3-512] ร— 3 + MaxPool
  (28ร—28ร—256) โ†’ (28ร—28ร—512) โ†’ (14ร—14ร—512)

Block 5: [Conv3-512] ร— 3 + MaxPool
  (14ร—14ร—512) โ†’ (14ร—14ร—512) โ†’ (7ร—7ร—512)

Classifier:
  Flatten: 7ร—7ร—512 = 25,088
  FC1: 25088 โ†’ 4096 + ReLU + Dropout
  FC2: 4096 โ†’ 4096 + ReLU + Dropout
  FC3: 4096 โ†’ 1000 (classes)

ํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ„ํฌ:
- Conv layers: ~15M (11%)
- FC layers: ~124M (89%)  โ† ๋Œ€๋ถ€๋ถ„!

VGG ์„ค์ • (Configuration)

cfg = {
    'VGG11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
    'VGG16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
    'VGG19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
}
# 'M' = MaxPool

ํŒŒ์ผ ๊ตฌ์กฐ

04_VGG/
โ”œโ”€โ”€ README.md                      # ์ด ํŒŒ์ผ
โ”œโ”€โ”€ pytorch_lowlevel/
โ”‚   โ””โ”€โ”€ vgg_lowlevel.py           # F.conv2d, F.linear ์‚ฌ์šฉ
โ”œโ”€โ”€ paper/
โ”‚   โ””โ”€โ”€ vgg_paper.py              # ๋…ผ๋ฌธ ์•„ํ‚คํ…์ฒ˜ ์ •ํ™• ์žฌํ˜„
โ””โ”€โ”€ exercises/
    โ”œโ”€โ”€ 01_feature_visualization.md   # ๊ฐ ๋ธ”๋ก feature map ์‹œ๊ฐํ™”
    โ””โ”€โ”€ 02_transfer_learning.md       # ์‚ฌ์ „ํ•™์Šต ๊ฐ€์ค‘์น˜ ํ™œ์šฉ

ํ•ต์‹ฌ ๊ฐœ๋…

1. Deep & Narrow vs Shallow & Wide

VGG ์ด์ „: ํฐ ํ•„ํ„ฐ + ์–•์€ ๋„คํŠธ์›Œํฌ
  - AlexNet: 11ร—11, 5ร—5 ํ•„ํ„ฐ
  - ์ ์€ ๋ ˆ์ด์–ด

VGG: ์ž‘์€ ํ•„ํ„ฐ + ๊นŠ์€ ๋„คํŠธ์›Œํฌ
  - ์˜ค์ง 3ร—3 ํ•„ํ„ฐ (+ ์ผ๋ถ€ 1ร—1)
  - 16~19 ๋ ˆ์ด์–ด

๊ฒฐ๋ก : ๊นŠ์ด๊ฐ€ ์„ฑ๋Šฅ์— ๋งค์šฐ ์ค‘์š”

2. ๊ท ์ผํ•œ ๊ตฌ์กฐ

VGG์˜ ์„ค๊ณ„ ์›์น™:

1. ๋ชจ๋“  Conv๋Š” 3ร—3, stride=1, padding=1
2. ๋ชจ๋“  MaxPool์€ 2ร—2, stride=2
3. ๋ธ”๋ก๋งˆ๋‹ค ์ฑ„๋„ ์ˆ˜ 2๋ฐฐ ์ฆ๊ฐ€ (64โ†’128โ†’256โ†’512)
4. ๊ฐ„๋‹จํ•˜๊ณ  ๊ทœ์น™์  โ†’ ์ดํ•ด/๊ตฌํ˜„ ์šฉ์ด

3. VGG์˜ ํ•œ๊ณ„

๋‹จ์ :
1. ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณผ๋‹ค (138M, ResNet-50: 25M)
2. ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„ ํผ (FC ๋ ˆ์ด์–ด)
3. ํ•™์Šต ๋А๋ฆผ
4. Gradient vanishing (๊นŠ์–ด์งˆ์ˆ˜๋ก)

ํ›„์† ์—ฐ๊ตฌ:
- GoogLeNet: Inception ๋ชจ๋“ˆ๋กœ ํšจ์œจ์„ฑ
- ResNet: Skip connection์œผ๋กœ ๋” ๊นŠ๊ฒŒ
- MobileNet: Depthwise separable conv

4. VGG as Feature Extractor

VGG๋Š” ํŠน์ง• ์ถ”์ถœ๊ธฐ๋กœ ๋„๋ฆฌ ์‚ฌ์šฉ:

1. Style Transfer
   - ์ฝ˜ํ…์ธ : block4_conv2
   - ์Šคํƒ€์ผ: block1~5_conv1

2. Perceptual Loss
   - ํ”ฝ์…€ ์†์‹ค ๋Œ€์‹  VGG ํŠน์ง• ๋น„๊ต

3. Object Detection
   - VGG backbone + detection head

๊ตฌํ˜„ ๋ ˆ๋ฒจ

Level 2: PyTorch Low-Level (pytorch_lowlevel/)

  • F.conv2d, F.max_pool2d, F.linear ์‚ฌ์šฉ
  • nn.Conv2d, nn.Linear ๋ฏธ์‚ฌ์šฉ
  • ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋™ ์ดˆ๊ธฐํ™” ๋ฐ ๊ด€๋ฆฌ
  • ๋ธ”๋ก ๋‹จ์œ„ ๋ชจ๋“ˆํ™”

Level 3: Paper Implementation (paper/)

  • ๋…ผ๋ฌธ์˜ ๋ชจ๋“  ์„ค์ • ์žฌํ˜„
  • Batch Normalization ์ถ”๊ฐ€ (VGG-BN)
  • ๋‹ค์–‘ํ•œ VGG ๋ณ€ํ˜• ์ง€์›

ํ•™์Šต ์ฒดํฌ๋ฆฌ์ŠคํŠธ

  • [ ] 3ร—3 ํ•„ํ„ฐ ์Šคํƒ์˜ ์žฅ์  ์ดํ•ด
  • [ ] Receptive field ๊ณ„์‚ฐ ๋ฐฉ๋ฒ• ์ˆ™์ง€
  • [ ] VGG16 ์•„ํ‚คํ…์ฒ˜ ์•”๊ธฐ
  • [ ] ํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ„ํฌ ์ดํ•ด (Conv vs FC)
  • [ ] VGG๋ฅผ feature extractor๋กœ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•
  • [ ] VGG์˜ ํ•œ๊ณ„์™€ ํ›„์† ๋ชจ๋ธ ๋น„๊ต

์ฐธ๊ณ  ์ž๋ฃŒ

to navigate between lessons