04. ํ์ต ๊ธฐ๋ฒ
04. ํ์ต ๊ธฐ๋ฒ¶
์ด์ : ์ญ์ ํ | ๋ค์: Linear & Logistic Regression
ํ์ต ๋ชฉํ¶
- ๊ฒฝ์ฌ ํ๊ฐ๋ฒ ๋ณํ ์ดํด (SGD, Momentum, Adam)
- ํ์ต๋ฅ ์ค์ผ์ค๋ง
- ์ ๊ทํ ๊ธฐ๋ฒ (Dropout, Weight Decay, Batch Norm)
- ๊ณผ์ ํฉ ๋ฐฉ์ง์ ์กฐ๊ธฐ ์ข ๋ฃ
1. ๊ฒฝ์ฌ ํ๊ฐ๋ฒ (Gradient Descent)¶
๊ธฐ๋ณธ ์๋ฆฌ¶
W(t+1) = W(t) - ฮท ร โL
- ฮท: ํ์ต๋ฅ (learning rate)
- โL: ์์ค ํจ์์ ๊ธฐ์ธ๊ธฐ
๋ณํ๋ค¶
| ๋ฐฉ๋ฒ | ์์ | ํน์ง |
|---|---|---|
| SGD | W -= lr ร g | ๋จ์, ๋๋ฆผ |
| Momentum | v = ฮฒv + g; W -= lr ร v | ๊ด์ฑ ์ถ๊ฐ |
| AdaGrad | ์ ์์ ํ์ต๋ฅ | ํฌ์ ๋ฐ์ดํฐ์ ์ ๋ฆฌ |
| RMSprop | ์ง์ ์ด๋ ํ๊ท | AdaGrad ๊ฐ์ |
| Adam | Momentum + RMSprop | ๊ฐ์ฅ ๋ณดํธ์ |
2. Momentum¶
๊ด์ฑ์ ์ถ๊ฐํ์ฌ ์ง๋์ ์ค์ ๋๋ค.
v(t) = ฮฒ ร v(t-1) + โL
W(t+1) = W(t) - ฮท ร v(t)
NumPy ๊ตฌํ¶
def sgd_momentum(W, grad, v, lr=0.01, beta=0.9):
v = beta * v + grad # ์๋ ์
๋ฐ์ดํธ
W = W - lr * v # ๊ฐ์ค์น ์
๋ฐ์ดํธ
return W, v
PyTorch¶
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
3. Adam Optimizer¶
Momentum๊ณผ RMSprop์ ์ฅ์ ์ ๊ฒฐํฉํฉ๋๋ค.
m(t) = ฮฒโ ร m(t-1) + (1-ฮฒโ) ร g # 1์ฐจ ๋ชจ๋ฉํธ
v(t) = ฮฒโ ร v(t-1) + (1-ฮฒโ) ร gยฒ # 2์ฐจ ๋ชจ๋ฉํธ
mฬ = m / (1 - ฮฒโแต) # ํธํฅ ๋ณด์
vฬ = v / (1 - ฮฒโแต)
W = W - ฮท ร mฬ / (โvฬ + ฮต)
NumPy ๊ตฌํ¶
def adam(W, grad, m, v, t, lr=0.001, beta1=0.9, beta2=0.999, eps=1e-8):
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * (grad ** 2)
m_hat = m / (1 - beta1 ** t)
v_hat = v / (1 - beta2 ** t)
W = W - lr * m_hat / (np.sqrt(v_hat) + eps)
return W, m, v
PyTorch¶
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
4. ํ์ต๋ฅ ์ค์ผ์ค๋ง¶
ํ์ต ์ค ํ์ต๋ฅ ์ ์กฐ์ ํฉ๋๋ค.
์ฃผ์ ๋ฐฉ๋ฒ¶
| ๋ฐฉ๋ฒ | ํน์ง |
|---|---|
| Step Decay | N ์ํญ๋ง๋ค ฮณ ๋ฐฐ๋ก ๊ฐ์ |
| Exponential | lr = lrโ ร ฮณแตแตแตแถสฐ |
| Cosine Annealing | ์ฝ์ฌ์ธ ํจ์๋ก ๊ฐ์ |
| ReduceLROnPlateau | ๊ฒ์ฆ ์์ค ์ ์ฒด ์ ๊ฐ์ |
| Warmup | ์ด๊ธฐ์ ์ ์ง์ ์ฆ๊ฐ |
PyTorch ์์¶
# Step Decay
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
# Cosine Annealing
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100)
# ReduceLROnPlateau
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', patience=10, factor=0.5
)
# ํ์ต ๋ฃจํ์์
for epoch in range(epochs):
train(...)
scheduler.step() # ์ํญ ๋์ ํธ์ถ
5. Dropout¶
ํ์ต ์ค ๋๋คํ๊ฒ ๋ด๋ฐ์ ๋นํ์ฑํํฉ๋๋ค.
์๋ฆฌ¶
ํ๋ จ: y = x ร mask / (1 - p) # mask๋ Bernoulli(1-p)
์ถ๋ก : y = x # ๋ง์คํฌ ์์
NumPy ๊ตฌํ¶
def dropout(x, p=0.5, training=True):
if not training:
return x
mask = (np.random.rand(*x.shape) > p).astype(float)
return x * mask / (1 - p)
PyTorch¶
class MLPWithDropout(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim, dropout_p=0.5):
super().__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.dropout = nn.Dropout(p=dropout_p)
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.dropout(x) # ํ๋ จ ์๋ง ํ์ฑํ
x = self.fc2(x)
return x
# ์ถ๋ก ์
model.eval() # dropout ๋นํ์ฑํ
6. Batch Normalization¶
๊ฐ ์ธต์ ์ ๋ ฅ์ ์ ๊ทํํฉ๋๋ค.
์์¶
ฮผ = mean(x)
ฯยฒ = var(x)
xฬ = (x - ฮผ) / โ(ฯยฒ + ฮต)
y = ฮณ ร xฬ + ฮฒ # ํ์ต ๊ฐ๋ฅํ ํ๋ผ๋ฏธํฐ
NumPy ๊ตฌํ¶
def batch_norm(x, gamma, beta, eps=1e-5, training=True,
running_mean=None, running_var=None, momentum=0.1):
if training:
mean = np.mean(x, axis=0)
var = np.var(x, axis=0)
# ์ด๋ ํ๊ท ์
๋ฐ์ดํธ
if running_mean is not None:
running_mean = momentum * mean + (1 - momentum) * running_mean
running_var = momentum * var + (1 - momentum) * running_var
else:
mean = running_mean
var = running_var
x_norm = (x - mean) / np.sqrt(var + eps)
return gamma * x_norm + beta
PyTorch¶
class CNNWithBatchNorm(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 64, 3)
self.bn1 = nn.BatchNorm2d(64)
self.fc1 = nn.Linear(64, 10)
self.bn_fc = nn.BatchNorm1d(10)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = x.flatten(1)
x = self.bn_fc(self.fc1(x))
return x
7. Weight Decay (L2 ์ ๊ทํ)¶
๊ฐ์ค์น ํฌ๊ธฐ์ ํจ๋ํฐ๋ฅผ ๋ถ์ฌํฉ๋๋ค.
์์¶
L_total = L_data + ฮป ร ||W||ยฒ
โL_total = โL_data + 2ฮปW
PyTorch¶
# ๋ฐฉ๋ฒ 1: optimizer์์ ์ค์
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
# ๋ฐฉ๋ฒ 2: ์์ค์ ์ง์ ์ถ๊ฐ
l2_lambda = 1e-4
l2_reg = sum(p.pow(2).sum() for p in model.parameters())
loss = criterion(output, target) + l2_lambda * l2_reg
8. ์กฐ๊ธฐ ์ข ๋ฃ (Early Stopping)¶
๊ฒ์ฆ ์์ค์ด ๊ฐ์ ๋์ง ์์ผ๋ฉด ํ์ต์ ์ค๋จํฉ๋๋ค.
PyTorch ๊ตฌํ¶
class EarlyStopping:
def __init__(self, patience=10, min_delta=0):
self.patience = patience
self.min_delta = min_delta
self.counter = 0
self.best_loss = None
self.early_stop = False
def __call__(self, val_loss):
if self.best_loss is None:
self.best_loss = val_loss
elif val_loss > self.best_loss - self.min_delta:
self.counter += 1
if self.counter >= self.patience:
self.early_stop = True
else:
self.best_loss = val_loss
self.counter = 0
# ์ฌ์ฉ
early_stopping = EarlyStopping(patience=10)
for epoch in range(epochs):
train_loss = train(model, train_loader)
val_loss = validate(model, val_loader)
early_stopping(val_loss)
if early_stopping.early_stop:
print(f"Early stopping at epoch {epoch}")
break
9. ๋ฐ์ดํฐ ์ฆ๊ฐ (Data Augmentation)¶
ํ๋ จ ๋ฐ์ดํฐ๋ฅผ ๋ณํํ์ฌ ๋ค์์ฑ์ ์ฆ๊ฐ์ํต๋๋ค.
์ด๋ฏธ์ง ๋ฐ์ดํฐ¶
from torchvision import transforms
transform = transforms.Compose([
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.RandomCrop(32, padding=4),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
10. NumPy vs PyTorch ๋น๊ต¶
Optimizer ๊ตฌํ¶
# NumPy (์๋ ๊ตฌํ)
m = np.zeros_like(W)
v = np.zeros_like(W)
for t in range(1, epochs + 1):
grad = compute_gradient(W, X, y)
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * grad**2
m_hat = m / (1 - beta1**t)
v_hat = v / (1 - beta2**t)
W -= lr * m_hat / (np.sqrt(v_hat) + eps)
# PyTorch (์๋)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
for epoch in range(epochs):
loss = criterion(model(X), y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
์ ๋ฆฌ¶
ํต์ฌ ๊ฐ๋ ¶
- Optimizer: Adam์ด ๊ธฐ๋ณธ ์ ํ, SGD+Momentum๋ ์ฌ์ ํ ์ ํจ
- ํ์ต๋ฅ : ์ ์ ํ ์ค์ผ์ค๋ง์ผ๋ก ์๋ ด ๊ฐ์
- ์ ๊ทํ: Dropout, BatchNorm, Weight Decay ์กฐํฉ
- ์กฐ๊ธฐ ์ข ๋ฃ: ๊ณผ์ ํฉ ๋ฐฉ์ง์ ๊ธฐ๋ณธ
๊ถ์ฅ ์์ ์ค์ ¶
# ๊ธฐ๋ณธ ์ค์
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5)
๋ค์ ๋จ๊ณ¶
07_CNN_Basics.md์์ ํฉ์ฑ๊ณฑ ์ ๊ฒฝ๋ง์ ํ์ตํฉ๋๋ค.