08. RNN ๊ธฐ์ดˆ (Recurrent Neural Networks)

08. RNN ๊ธฐ์ดˆ (Recurrent Neural Networks)

ํ•™์Šต ๋ชฉํ‘œ

  • ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์˜ ๊ฐœ๋…๊ณผ ๊ตฌ์กฐ
  • ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
  • PyTorch nn.RNN ์‚ฌ์šฉ๋ฒ•
  • ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ ์ดํ•ด

1. RNN์ด๋ž€?

์ˆœ์ฐจ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ

์‹œ๊ณ„์—ด: [1, 2, 3, 4, 5, ...]  - ์ด์ „ ๊ฐ’์ด ๋‹ค์Œ ๊ฐ’์— ์˜ํ–ฅ
ํ…์ŠคํŠธ: "๋‚˜๋Š” ํ•™๊ต์— ๊ฐ„๋‹ค"     - ์ด์ „ ๋‹จ์–ด๊ฐ€ ๋‹ค์Œ ๋‹จ์–ด์— ์˜ํ–ฅ

MLP์˜ ํ•œ๊ณ„

  • ๊ณ ์ •๋œ ์ž…๋ ฅ ํฌ๊ธฐ
  • ์ˆœ์„œ ์ •๋ณด ๋ฌด์‹œ
  • ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค ์ฒ˜๋ฆฌ ๋ถˆ๊ฐ€

RNN์˜ ํ•ด๊ฒฐ

h(t) = tanh(W_xh ร— x(t) + W_hh ร— h(t-1) + b)

h(t): ํ˜„์žฌ ์€๋‹‰ ์ƒํƒœ
x(t): ํ˜„์žฌ ์ž…๋ ฅ
h(t-1): ์ด์ „ ์€๋‹‰ ์ƒํƒœ

2. RNN ๊ตฌ์กฐ

์‹œ๊ฐ„ ํŽผ์นจ (Unrolling)

    x1      x2      x3      x4
    โ†“       โ†“       โ†“       โ†“
  โ”Œโ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”
  โ”‚ h โ”‚โ”€โ”€โ–บโ”‚ h โ”‚โ”€โ”€โ–บโ”‚ h โ”‚โ”€โ”€โ–บโ”‚ h โ”‚โ”€โ”€โ–บ ์ถœ๋ ฅ
  โ””โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”˜
    h0      h1      h2      h3

ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ 

  • ๋ชจ๋“  ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ ๋™์ผํ•œ W_xh, W_hh ์‚ฌ์šฉ
  • ๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ

3. PyTorch RNN

๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•

import torch
import torch.nn as nn

# RNN ์ƒ์„ฑ
rnn = nn.RNN(
    input_size=10,    # ์ž…๋ ฅ ์ฐจ์›
    hidden_size=20,   # ์€๋‹‰ ์ƒํƒœ ์ฐจ์›
    num_layers=2,     # RNN ์ธต ์ˆ˜
    batch_first=True  # ์ž…๋ ฅ: (batch, seq, feature)
)

# ์ž…๋ ฅ ํ˜•ํƒœ: (batch_size, seq_len, input_size)
x = torch.randn(32, 15, 10)  # ๋ฐฐ์น˜ 32, ์‹œํ€€์Šค 15, ํŠน์„ฑ 10

# ์ˆœ์ „ํŒŒ
# output: ๋ชจ๋“  ์‹œ๊ฐ„์˜ ์€๋‹‰ ์ƒํƒœ (batch, seq, hidden)
# h_n: ๋งˆ์ง€๋ง‰ ์€๋‹‰ ์ƒํƒœ (layers, batch, hidden)
output, h_n = rnn(x)

print(f"output: {output.shape}")  # (32, 15, 20)
print(f"h_n: {h_n.shape}")        # (2, 32, 20)

์–‘๋ฐฉํ–ฅ RNN

rnn_bi = nn.RNN(
    input_size=10,
    hidden_size=20,
    num_layers=1,
    batch_first=True,
    bidirectional=True  # ์–‘๋ฐฉํ–ฅ
)

output, h_n = rnn_bi(x)
print(f"output: {output.shape}")  # (32, 15, 40) - ์ •๋ฐฉํ–ฅ+์—ญ๋ฐฉํ–ฅ
print(f"h_n: {h_n.shape}")        # (2, 32, 20) - ๋ฐฉํ–ฅ๋ณ„ ๋งˆ์ง€๋ง‰ ์ƒํƒœ

4. RNN ๋ถ„๋ฅ˜๊ธฐ ๊ตฌํ˜„

์‹œํ€€์Šค ๋ถ„๋ฅ˜ ๋ชจ๋ธ

class RNNClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, num_layers=1):
        super().__init__()
        self.rnn = nn.RNN(
            input_size, hidden_size,
            num_layers=num_layers,
            batch_first=True
        )
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # x: (batch, seq, features)
        output, h_n = self.rnn(x)

        # ๋งˆ์ง€๋ง‰ ์‹œ๊ฐ„์˜ ์€๋‹‰ ์ƒํƒœ ์‚ฌ์šฉ
        # h_n[-1]: ๋งˆ์ง€๋ง‰ ์ธต์˜ ์€๋‹‰ ์ƒํƒœ
        out = self.fc(h_n[-1])
        return out

Many-to-Many ๊ตฌ์กฐ

class RNNSeq2Seq(nn.Module):
    """์‹œํ€€์Šค โ†’ ์‹œํ€€์Šค"""
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        output, _ = self.rnn(x)
        # ๋ชจ๋“  ์‹œ๊ฐ„ ๋‹จ๊ณ„์— FC ์ ์šฉ
        out = self.fc(output)  # (batch, seq, output_size)
        return out

5. ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ

๋ฌธ์ œ

๊ธด ์‹œํ€€์Šค์—์„œ:
h100 โ† W_hh ร— W_hh ร— ... ร— W_hh ร— h1
                    โ†‘
            100๋ฒˆ ๊ณฑ์…ˆ โ†’ ๊ธฐ์šธ๊ธฐ ํญ๋ฐœ ๋˜๋Š” ์†Œ์‹ค

์›์ธ

  • |W_hh| > 1: ๊ธฐ์šธ๊ธฐ ํญ๋ฐœ
  • |W_hh| < 1: ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค

ํ•ด๊ฒฐ์ฑ…

  1. LSTM/GRU ์‚ฌ์šฉ (๋‹ค์Œ ๋ ˆ์Šจ)
  2. Gradient Clipping
# ๊ธฐ์šธ๊ธฐ ํด๋ฆฌํ•‘
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

6. ์‹œ๊ณ„์—ด ์˜ˆ์ธก ์˜ˆ์ œ

์‚ฌ์ธํŒŒ ์˜ˆ์ธก

import numpy as np

# ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
def generate_sin_data(seq_len=50, n_samples=1000):
    X = []
    y = []
    for _ in range(n_samples):
        start = np.random.uniform(0, 2*np.pi)
        seq = np.sin(np.linspace(start, start + 4*np.pi, seq_len + 1))
        X.append(seq[:-1].reshape(-1, 1))
        y.append(seq[-1])
    return np.array(X), np.array(y)

X, y = generate_sin_data()
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

# ๋ชจ๋ธ
class SinPredictor(nn.Module):
    def __init__(self):
        super().__init__()
        self.rnn = nn.RNN(1, 32, batch_first=True)
        self.fc = nn.Linear(32, 1)

    def forward(self, x):
        _, h_n = self.rnn(x)
        return self.fc(h_n[-1]).squeeze()

7. ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ์˜ˆ์ œ

๋ฌธ์ž ์ˆ˜์ค€ RNN

class CharRNN(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.rnn = nn.RNN(embed_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # x: (batch, seq) - ๋ฌธ์ž ์ธ๋ฑ์Šค
        embedded = self.embedding(x)  # (batch, seq, embed)
        output, h_n = self.rnn(embedded)
        out = self.fc(h_n[-1])
        return out

# ์˜ˆ์‹œ
vocab_size = 27  # a-z + ๊ณต๋ฐฑ
model = CharRNN(vocab_size, embed_size=32, hidden_size=64, num_classes=5)

8. ์ฃผ์˜์‚ฌํ•ญ

์ž…๋ ฅ ํ˜•ํƒœ

# batch_first=True  โ†’ (batch, seq, feature)
# batch_first=False โ†’ (seq, batch, feature)  # ๊ธฐ๋ณธ๊ฐ’

๊ฐ€๋ณ€ ๊ธธ์ด ์‹œํ€€์Šค

from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

# ํŒจ๋”ฉ๋œ ์‹œํ€€์Šค์™€ ์‹ค์ œ ๊ธธ์ด
padded_seqs = ...  # (batch, max_len, features)
lengths = ...      # ๊ฐ ์‹œํ€€์Šค์˜ ์‹ค์ œ ๊ธธ์ด

# ํŒจํ‚น (ํŒจ๋”ฉ ๋ฌด์‹œ)
packed = pack_padded_sequence(padded_seqs, lengths,
                               batch_first=True, enforce_sorted=False)
output, h_n = rnn(packed)

# ์–ธํŒจํ‚น
output_padded, _ = pad_packed_sequence(output, batch_first=True)

9. RNN ๋ณ€ํ˜• ๋น„๊ต

๋ชจ๋ธ ์žฅ์  ๋‹จ์ 
Simple RNN ๋‹จ์ˆœ, ๋น ๋ฆ„ ๊ธด ์‹œํ€€์Šค ํ•™์Šต ์–ด๋ ค์›€
LSTM ์žฅ๊ธฐ ์˜์กด์„ฑ ํ•™์Šต ๋ณต์žก, ๋А๋ฆผ
GRU LSTM๊ณผ ์œ ์‚ฌ, ๋” ๋‹จ์ˆœ -

์ •๋ฆฌ

ํ•ต์‹ฌ ๊ฐœ๋…

  1. ์ˆœํ™˜ ๊ตฌ์กฐ: ์ด์ „ ์ƒํƒœ๊ฐ€ ๋‹ค์Œ ๊ณ„์‚ฐ์— ์˜ํ–ฅ
  2. ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ : ์‹œ๊ฐ„ ๋…๋ฆฝ์  ๊ฐ€์ค‘์น˜
  3. ๊ธฐ์šธ๊ธฐ ๋ฌธ์ œ: ๊ธด ์‹œํ€€์Šค์—์„œ ํ•™์Šต ์–ด๋ ค์›€

ํ•ต์‹ฌ ์ฝ”๋“œ

rnn = nn.RNN(input_size, hidden_size, batch_first=True)
output, h_n = rnn(x)  # output: ์ „์ฒด, h_n: ๋งˆ์ง€๋ง‰

๋‹ค์Œ ๋‹จ๊ณ„

14_LSTM_GRU.md์—์„œ LSTM๊ณผ GRU๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

to navigate between lessons