09. LSTM๊ณผ GRU
09. LSTM๊ณผ GRU¶
ํ์ต ๋ชฉํ¶
- LSTM๊ณผ GRU์ ๊ตฌ์กฐ ์ดํด
- ๊ฒ์ดํธ ๋ฉ์ปค๋์ฆ
- ์ฅ๊ธฐ ์์กด์ฑ ํ์ต
- PyTorch ๊ตฌํ
1. LSTM (Long Short-Term Memory)¶
๋ฌธ์ : RNN์ ๊ธฐ์ธ๊ธฐ ์์ค¶
h100 โ W ร W ร ... ร W ร h1
โ
๊ธฐ์ธ๊ธฐ๊ฐ 0์ ์๋ ด
ํด๊ฒฐ: ์ ์ํ (Cell State)¶
LSTM = ์
์ํ (์ฅ๊ธฐ ๊ธฐ์ต) + ์๋ ์ํ (๋จ๊ธฐ ๊ธฐ์ต)
LSTM ๊ตฌ์กฐ¶
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ์
์ํ (C) โ
โ รโโโโโ(+)โโโโโโโโโโโโโโโโโโโโโโบ โ
โ โ โ โ
โ forget input โ
โ gate gate โ
โ โ โ โ
h(t-1)โโดโโโบ[ฯ] [ฯ][tanh] [ฯ]โโโบรโโโโโโโบh(t)
f(t) i(t) g(t) o(t) โ
output gate
๊ฒ์ดํธ ์์¶
# Forget Gate: ์ด์ ๊ธฐ์ต ์ค ์ผ๋ง๋ ์์์ง
f(t) = ฯ(W_f ร [h(t-1), x(t)] + b_f)
# Input Gate: ์ ์ ๋ณด ์ค ์ผ๋ง๋ ์ ์ฅํ ์ง
i(t) = ฯ(W_i ร [h(t-1), x(t)] + b_i)
# Cell Candidate: ์๋ก์ด ํ๋ณด ์ ๋ณด
g(t) = tanh(W_g ร [h(t-1), x(t)] + b_g)
# Cell State Update
C(t) = f(t) ร C(t-1) + i(t) ร g(t)
# Output Gate: ์
์ํ ์ค ์ผ๋ง๋ ์ถ๋ ฅํ ์ง
o(t) = ฯ(W_o ร [h(t-1), x(t)] + b_o)
# Hidden State
h(t) = o(t) ร tanh(C(t))
2. GRU (Gated Recurrent Unit)¶
LSTM์ ๋จ์ํ ๋ฒ์ ¶
GRU = Reset Gate + Update Gate
(์
์ํ์ ์๋ ์ํ ํตํฉ)
GRU ๊ตฌ์กฐ¶
Update Gate (z)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
h(t-1)โโดโโโบ[ฯ]โโโz(t)โโโโโโรโโ(+)โโโบh(t)
โ โ โ
โ โโโโโโ โ
โ โ รโโโโโโ
โ โ โ
โโโโบ[ฯ] [tanh]
โ r(t) โ
โ โ โ
โโโโโรโโโโโโโ
Reset Gate (r)
๊ฒ์ดํธ ์์¶
# Update Gate: ์ด์ ์ํ vs ์ ์ํ ๋น์จ
z(t) = ฯ(W_z ร [h(t-1), x(t)] + b_z)
# Reset Gate: ์ด์ ์ํ๋ฅผ ์ผ๋ง๋ ์์์ง
r(t) = ฯ(W_r ร [h(t-1), x(t)] + b_r)
# Candidate Hidden
hฬ(t) = tanh(W ร [r(t) ร h(t-1), x(t)] + b)
# Hidden State Update
h(t) = (1 - z(t)) ร h(t-1) + z(t) ร hฬ(t)
3. PyTorch LSTM/GRU¶
LSTM¶
lstm = nn.LSTM(
input_size=10,
hidden_size=20,
num_layers=2,
batch_first=True,
dropout=0.1,
bidirectional=False
)
# ์์ ํ
# output: ๋ชจ๋ ์๊ฐ์ ์๋ ์ํ
# (h_n, c_n): ๋ง์ง๋ง (์๋, ์
) ์ํ
output, (h_n, c_n) = lstm(x)
GRU¶
gru = nn.GRU(
input_size=10,
hidden_size=20,
num_layers=2,
batch_first=True
)
# ์์ ํ (์
์ํ ์์)
output, h_n = gru(x)
4. LSTM ๋ถ๋ฅ๊ธฐ¶
class LSTMClassifier(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(
embed_dim, hidden_dim,
num_layers=2,
batch_first=True,
dropout=0.3,
bidirectional=True
)
# ์๋ฐฉํฅ์ด๋ฏ๋ก hidden_dim * 2
self.fc = nn.Linear(hidden_dim * 2, num_classes)
def forward(self, x):
# x: (batch, seq) - ํ ํฐ ์ธ๋ฑ์ค
embedded = self.embedding(x)
# LSTM
output, (h_n, c_n) = self.lstm(embedded)
# ์๋ฐฉํฅ ๋ง์ง๋ง ์๋ ์ํ ๊ฒฐํฉ
# h_n: (num_layers*2, batch, hidden)
forward_last = h_n[-2] # ์ ๋ฐฉํฅ ๋ง์ง๋ง ์ธต
backward_last = h_n[-1] # ์ญ๋ฐฉํฅ ๋ง์ง๋ง ์ธต
combined = torch.cat([forward_last, backward_last], dim=1)
return self.fc(combined)
5. ์ํ์ค ์์ฑ (์ธ์ด ๋ชจ๋ธ)¶
class LSTMLanguageModel(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim):
super().__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, vocab_size)
def forward(self, x, hidden=None):
embedded = self.embedding(x)
output, hidden = self.lstm(embedded, hidden)
logits = self.fc(output)
return logits, hidden
def generate(self, start_token, max_len, temperature=1.0):
self.eval()
tokens = [start_token]
hidden = None
with torch.no_grad():
for _ in range(max_len):
x = torch.tensor([[tokens[-1]]])
logits, hidden = self(x, hidden)
# Temperature sampling
probs = F.softmax(logits[0, -1] / temperature, dim=0)
next_token = torch.multinomial(probs, 1).item()
tokens.append(next_token)
return tokens
6. LSTM vs GRU ๋น๊ต¶
| ํญ๋ชฉ | LSTM | GRU |
|---|---|---|
| ๊ฒ์ดํธ ์ | 3๊ฐ (f, i, o) | 2๊ฐ (r, z) |
| ์ํ | ์ + ์๋ | ์๋๋ง |
| ํ๋ผ๋ฏธํฐ | ๋ ๋ง์ | ๋ ์ ์ |
| ํ์ต ์๋ | ๋๋ฆผ | ๋น ๋ฆ |
| ์ฑ๋ฅ | ๋ณต์กํ ํจํด | ๋น์ทํ๊ฑฐ๋ ์ฝ๊ฐ ๋ฎ์ |
์ ํ ๊ฐ์ด๋¶
- LSTM: ๊ธด ์ํ์ค, ๋ณต์กํ ์์กด์ฑ
- GRU: ๋น ๋ฅธ ํ์ต, ์ ํ๋ ์์
7. ์ค์ ํ¶
์ด๊ธฐํ¶
# ์๋ ์ํ ์ด๊ธฐํ
def init_hidden(batch_size, hidden_size, num_layers, bidirectional):
num_directions = 2 if bidirectional else 1
h = torch.zeros(num_layers * num_directions, batch_size, hidden_size)
c = torch.zeros(num_layers * num_directions, batch_size, hidden_size)
return (h.to(device), c.to(device))
Dropout ํจํด¶
class LSTMWithDropout(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, dropout=0.5):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.dropout = nn.Dropout(dropout)
self.fc = nn.Linear(hidden_size, num_classes)
def forward(self, x):
output, (h_n, _) = self.lstm(x)
# ๋ง์ง๋ง ์๋ ์ํ์ dropout
dropped = self.dropout(h_n[-1])
return self.fc(dropped)
์ ๋ฆฌ¶
ํต์ฌ ๊ฐ๋ ¶
- LSTM: ์ ์ํ๋ก ์ฅ๊ธฐ ๊ธฐ์ต ์ ์ง, 3๊ฐ ๊ฒ์ดํธ
- GRU: LSTM ๋จ์ํ, 2๊ฐ ๊ฒ์ดํธ
- ๊ฒ์ดํธ: ์ ๋ณด ํ๋ฆ ์ ์ด (์๊ทธ๋ชจ์ด๋ ร ๊ฐ)
ํต์ฌ ์ฝ๋¶
# LSTM
lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
output, (h_n, c_n) = lstm(x)
# GRU
gru = nn.GRU(input_size, hidden_size, batch_first=True)
output, h_n = gru(x)
๋ค์ ๋จ๊ณ¶
16_Attention_Transformer.md์์ Seq2Seq์ Attention์ ํ์ตํฉ๋๋ค.