05. ResNet
05. ResNet¶
κ°μ¶
ResNet(Residual Network)μ 2015λ ILSVRCμμ 1μλ₯Ό μ°¨μ§ν νλͺ μ μΈ λͺ¨λΈμ λλ€. Kaiming He λ±μ΄ μ μν Skip Connection (Residual Connection)μ ν΅ν΄ μλ°± κ° μ΄μμ λ μ΄μ΄λ₯Ό νμ΅ν μ μκ² λμμ΅λλ€.
"κΉμ΄κ° κΉμ΄μ§μλ‘ μ±λ₯μ΄ λ¨μ΄μ§λ degradation λ¬Έμ λ₯Ό ν΄κ²°"
μνμ λ°°κ²½¶
1. Degradation Problem¶
λ¬Έμ : λ€νΈμν¬κ° κΉμ΄μ§λ©΄ μ€νλ € μ±λ₯ μ ν
κ΄μ°°:
- 56-layer network < 20-layer network (CIFAR-10)
- μ΄λ overfittingμ΄ μλ (training errorλ λμ)
- μ΅μ νμ μ΄λ €μ (vanishing/exploding gradient)
μ΄μμ μν©:
- λ κΉμ λ€νΈμν¬ β₯ μμ λ€νΈμν¬
- μ΅μν identity mappingμ νμ΅ν μ μμ΄μΌ ν¨
2. Residual Learning¶
κΈ°μ‘΄ μ κ·Ό:
H(x) = desired output
λ€νΈμν¬κ° H(x)λ₯Ό μ§μ νμ΅
Residual μ κ·Ό:
F(x) = H(x) - x (μμ°¨)
H(x) = F(x) + x (μλ λͺ©ν)
μ λ μ¬μ΄κ°?
- Identity mapping νμ΅: F(x) = 0λ§ λλ©΄ λ¨
- μμ λ³ν νμ΅: ν° λ³νλ³΄λ€ μ¬μ
- Gradient flow: λ§μ
μ°μ°μΌλ‘ μ§μ μ ν
3. Skip Connectionμ Gradient¶
Forward:
y = F(x) + x
Backward:
βL/βx = βL/βy Γ (βF/βx + 1)
β
νμ 1 μ΄μ!
κ²°κ³Ό:
- Gradientκ° μ΅μ 1μ κ²½λ‘λ‘ μ§μ μ ν
- μλ°± λ μ΄μ΄μμλ gradient μ μ§
- Vanishing gradient ν΄κ²°
4. μ°¨μ λ§μΆκΈ° (Projection Shortcut)¶
μ°¨μμ΄ λ€λ₯Ό λ (stride=2 λλ μ±λ λ³κ²½):
Option A: Zero Padding
x_padded = pad(x, extra_channels)
Option B: 1Γ1 Convolution (λ
Όλ¬Έ μ±ν)
shortcut = Conv1Γ1(x)
x: (N, 64, 56, 56)
β stride=2, channels 64β128
y: (N, 128, 28, 28)
shortcut = Conv1Γ1(64β128, stride=2)
ResNet μν€ν μ²¶
BasicBlock vs Bottleneck¶
BasicBlock (ResNet-18, 34):
βββββββββββββββββββββββββββ
β Conv 3Γ3, BN, ReLU β
β Conv 3Γ3, BN β
β β β
β + β shortcut β
β ReLU β
βββββββββββββββββββββββββββ
Bottleneck (ResNet-50, 101, 152):
βββββββββββββββββββββββββββ
β Conv 1Γ1, BN, ReLU β β μ±λ μΆμ
β Conv 3Γ3, BN, ReLU β β μ£Όμ μ°μ°
β Conv 1Γ1, BN β β μ±λ 볡μ
β β β
β + β shortcut β
β ReLU β
βββββββββββββββββββββββββββ
Bottleneck μ₯μ :
- 3Γ3 μ°μ° μ μ μ±λ μΆμ β κ³μ°λ κ°μ
- κ°μ κ³μ°λμΌλ‘ λ λ§μ λ μ΄μ΄
ResNet λ³ν λΉκ΅¶
| λͺ¨λΈ | λ μ΄μ΄ | λΈλ‘ | λΈλ‘ μ | Params |
|---|---|---|---|---|
| ResNet-18 | 18 | Basic | [2,2,2,2] | 11.7M |
| ResNet-34 | 34 | Basic | [3,4,6,3] | 21.8M |
| ResNet-50 | 50 | Bottleneck | [3,4,6,3] | 25.6M |
| ResNet-101 | 101 | Bottleneck | [3,4,23,3] | 44.5M |
| ResNet-152 | 152 | Bottleneck | [3,8,36,3] | 60.2M |
ResNet-50 μμΈ κ΅¬μ‘°¶
μ
λ ₯: 224Γ224Γ3
Conv1: 7Γ7, 64, stride=2, padding=3
β (112Γ112Γ64)
MaxPool: 3Γ3, stride=2, padding=1
β (56Γ56Γ64)
Layer1: Bottleneck Γ 3 (64β256)
β (56Γ56Γ256)
Layer2: Bottleneck Γ 4 (128β512, stride=2)
β (28Γ28Γ512)
Layer3: Bottleneck Γ 6 (256β1024, stride=2)
β (14Γ14Γ1024)
Layer4: Bottleneck Γ 3 (512β2048, stride=2)
β (7Γ7Γ2048)
AdaptiveAvgPool: β (1Γ1Γ2048)
FC: 2048 β 1000
νμΌ κ΅¬μ‘°¶
05_ResNet/
βββ README.md # μ΄ νμΌ
βββ pytorch_lowlevel/
β βββ resnet_lowlevel.py # F.conv2d, μλ BN
βββ paper/
β βββ resnet_paper.py # λ
Όλ¬Έ μ ν μ¬ν
βββ analysis/
β βββ gradient_flow.py # Skip connection ν¨κ³Ό λΆμ
βββ exercises/
βββ 01_gradient_analysis.md # Gradient flow λΉκ΅
βββ 02_ablation_study.md # Shortcut μ’
λ₯ λΉκ΅
ν΅μ¬ κ°λ ¶
1. Identity Mappingμ΄ μ€μν μ΄μ ¶
# Pre-activation ResNet (v2)
def forward(self, x):
identity = x
out = self.bn1(x)
out = F.relu(out)
out = self.conv1(out)
out = self.bn2(out)
out = F.relu(out)
out = self.conv2(out)
return out + identity # Clean identity path
# Post-activation (original)
def forward(self, x):
identity = self.shortcut(x)
out = self.conv1(x)
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = F.relu(out + identity) # ReLUκ° identityλ₯Ό λ³ν
return out
2. ResNetμ μμλΈ κ΄μ ¶
ResNetμ λ€μν κΉμ΄μ κ²½λ‘ μμλΈλ‘ λ³Ό μ μμ
nκ° λΈλ‘ β 2^n κ°μ κ°λ₯ν κ²½λ‘
- μΌλΆ λΈλ‘μ "건λλ°λ" κ²½λ‘
- λͺ¨λ λΈλ‘μ κ±°μΉλ κ²½λ‘
μ€ν: νμ΅ ν μΌλΆ λΈλ‘ μ κ±°ν΄λ μ±λ₯ μ μ§
β λ€μν κΉμ΄μ κ²½λ‘κ° ν¨κ» νμ΅λ¨
3. Batch Normalizationμ μν ¶
ResNetμμ BNμ΄ μ€μν μ΄μ :
1. λ΄λΆ 곡λ³λ λ³ν κ°μ
- λ μ΄μ΄ μ
λ ₯μ λΆν¬ μμ ν
2. νμ΅λ₯ μ¦κ° κ°λ₯
- λ λΉ λ₯Έ μλ ΄
3. Regularization ν¨κ³Ό
- λ―Έλλ°°μΉ ν΅κ³ μ¬μ© β λ
Έμ΄μ¦
4. Gradient flow κ°μ
- μ κ·νλ‘ gradient μμ ν
4. ResNet μ΄ν λ°μ ¶
ResNeXt (2017):
- Grouped convolutionμΌλ‘ cardinality λμ
- ResNeXt-50: ResNet-101 μ±λ₯, λ μ μ νλΌλ―Έν°
DenseNet (2017):
- λͺ¨λ λ μ΄μ΄λ₯Ό λͺ¨λ νμ λ μ΄μ΄μ μ°κ²°
- Feature reuse κ·Ήλν
EfficientNet (2019):
- Width, depth, resolution λμ μ€μΌμΌλ§
- Compound scaling
RegNet (2020):
- μ΅μ λ€νΈμν¬ κ΅¬μ‘° νμ
- λ¨μνκ³ κ·μΉμ μΈ μ€κ³
ꡬν λ 벨¶
Level 2: PyTorch Low-Level (pytorch_lowlevel/)¶
- F.conv2d, μλ BatchNorm
- BasicBlock, Bottleneck μλ ꡬν
- Shortcut projection ꡬν
- νλΌλ―Έν° μλ κ΄λ¦¬
Level 3: Paper Implementation (paper/)¶
- ResNet-18/34/50/101/152 μ 체
- Pre-activation ResNet (v2)
- Zero-padding vs Projection shortcut λΉκ΅
Level 4: Code Analysis (analysis/)¶
- torchvision ResNet μ½λ λΆμ
- Gradient flow μκ°ν
- μ€κ° λΈλ‘ μ κ±° μ€ν
νμ΅ μ²΄ν¬λ¦¬μ€νΈ¶
- [ ] Degradation problem μ΄ν΄
- [ ] Residual learning μμ μ λ
- [ ] Skip connectionμ gradient μ΄μ
- [ ] BasicBlock vs Bottleneck μ°¨μ΄
- [ ] ResNet-50 μν€ν μ² μκΈ°
- [ ] Projection shortcut ꡬν λ°©λ²
- [ ] Pre/Post-activation μ°¨μ΄
- [ ] ResNetμ μμλΈ κ΄μ μ΄ν΄
μ°Έκ³ μλ£¶
- He et al. (2015). "Deep Residual Learning for Image Recognition"
- He et al. (2016). "Identity Mappings in Deep Residual Networks" (v2)
- torchvision ResNet
- d2l.ai: ResNet
- ../04_VGG/README.md