Foundation Model ν¨λ¬λ€μ
Foundation Model ν¨λ¬λ€μ¶
νμ΅ λͺ©ν¶
- Foundation Modelμ μ μμ νΉμ§ μ΄ν΄
- μ ν΅μ MLμμ Foundation Modelλ‘μ ν¨λ¬λ€μ μ ν νμ
- In-context Learningκ³Ό Emergent Capabilities κ°λ μ΅λ
- μ£Όμ Foundation Model κ³λ³΄ νμ
1. Foundation Modelμ΄λ?¶
1.1 μ μ¶
Foundation Model(κΈ°λ° λͺ¨λΈ)μ 2021λ Stanford HAIμμ μ μν μ©μ΄λ‘, λ€μ νΉμ§μ κ°μ§ λͺ¨λΈμ μλ―Έν©λλ€:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Foundation Model μ μ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. λκ·λͺ¨ λ°μ΄ν°λ‘ μ¬μ νμ΅ (Pre-trained on broad data) β
β - μμμ΅~μμ‘° ν ν°μ ν
μ€νΈ β
β - μμ΅~μμμ΅ μ₯μ μ΄λ―Έμ§ β
β β
β 2. λ€μν νμ μμ
μ μ μ κ°λ₯ (Adaptable to many tasks) β
β - νλμ λͺ¨λΈλ‘ λΆλ₯, μμ±, QA, λ²μ λ± μν β
β - Fine-tuning λλ PromptingμΌλ‘ μ μ β
β β
β 3. λ²μ©μ νν νμ΅ (General-purpose representations) β
β - Task-agnosticν μ§μ μΈμ½λ© β
β - Transfer learningμ κ·Ήλν β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.2 μ ν΅μ ML vs Foundation Model¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β μ ν΅μ Machine Learning νμ΄νλΌμΈ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Task A ββββΊ Data A ββββΊ Model A ββββΊ Deploy A β
β Task B ββββΊ Data B ββββΊ Model B ββββΊ Deploy B β
β Task C ββββΊ Data C ββββΊ Model C ββββΊ Deploy C β
β β
β β’ κ° νμ€ν¬λ§λ€ λ³λ λ°μ΄ν° μμ§ β
β β’ κ° νμ€ν¬λ§λ€ λ³λ λͺ¨λΈ νμ΅ β
β β’ νμ€ν¬ κ° μ§μ 곡μ μ νμ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Foundation Model νμ΄νλΌμΈ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββ β
β Massive Data ββββΊ β Foundation Model β β
β (Web-scale) ββββββββββ¬βββββββββ β
β β β
β ββββββββββββββββΌβββββββββββββββ β
β βΌ βΌ βΌ β
β Adapt A Adapt B Adapt C β
β (Fine-tune) (Prompt) (LoRA) β
β β β β β
β βΌ βΌ βΌ β
β Task A Task B Task C β
β β
β β’ νλμ λκ·λͺ¨ μ¬μ νμ΅ β
β β’ κ²½λ μ μμΌλ‘ λ€μν νμ€ν¬ μν β
β β’ νμ€ν¬ κ° μ§μ μ μ΄ κ·Ήλν β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.3 Foundation Modelμ μ’ λ₯¶
| λΆλ₯ | λν λͺ¨λΈ | μ λ ₯/μΆλ ₯ |
|---|---|---|
| Language Models | GPT-4, LLaMA, Claude | ν μ€νΈ β ν μ€νΈ |
| Vision Models | ViT, DINOv2, SAM | μ΄λ―Έμ§ β νΉμ§/μΈκ·Έλ©ν μ΄μ |
| Multimodal | CLIP, LLaVA, GPT-4V | ν μ€νΈ+μ΄λ―Έμ§ β ν μ€νΈ |
| Generative | Stable Diffusion, DALL-E | ν μ€νΈ β μ΄λ―Έμ§ |
| Audio | Whisper, AudioLM | μ€λμ€ β ν μ€νΈ |
| Code | Codex, CodeLlama | ν μ€νΈ β μ½λ |
2. ν¨λ¬λ€μ μ νμ μ쬶
2.1 νμλΌμΈ¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Foundation Model μμ¬ β
ββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2017 β Transformer (Vaswani et al.) - Self-attention λμ
β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2018 β BERT (Google) - Masked LMμΌλ‘ μλ°©ν₯ λ¬Έλ§₯ νμ΅ β
β β GPT-1 (OpenAI) - 첫 λκ·λͺ¨ autoregressive LM β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2019 β GPT-2 (1.5B params) - "Too dangerous to release" β
β β T5 - Text-to-Text Transfer Transformer β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2020 β GPT-3 (175B) - In-context Learning λ°κ²¬ β
β β Scaling Laws λ
Όλ¬Έ (Kaplan et al.) β
β β ViT - Visionμ Transformer μ μ© β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2021 β CLIP - Vision-Language μ°κ²° β
β β DALL-E - ν
μ€νΈβμ΄λ―Έμ§ μμ± β
β β "Foundation Models" μ©μ΄ νμ (Stanford HAI) β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2022 β ChatGPT - LLMμ λμ€ν β
β β Chinchilla - Compute-optimal Scaling β
β β Stable Diffusion - μ€νμμ€ μ΄λ―Έμ§ μμ± β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2023 β GPT-4 - Multimodal Foundation Model β
β β LLaMA - μ€νμμ€ LLM νλͺ
β
β β SAM - Promptable Vision Foundation Model β
ββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 2024 β GPT-4o, Claude 3, Gemini 1.5 - μ±λ₯ κ²½μ β
β β LLaMA 3, Mistral - μ€νμμ€ λ°μ β
β β Sora - Video Foundation Model β
ββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.2 μ£Όμ μ νμ ¶
(1) GPT-3μ In-context Learning (2020)¶
GPT-3λ Few-shot learningμ κ°λ₯μ±μ 보μ¬μ£Όλ©° ν¨λ¬λ€μ μ νμ κ³κΈ°κ° λμμ΅λλ€:
# Traditional Approach: κ° νμ€ν¬λ§λ€ Fine-tuning νμ
model = load_pretrained("bert-base")
model = fine_tune(model, sentiment_dataset, epochs=3)
result = model.predict("This movie was great!")
# GPT-3 In-context Learning: ν둬ννΈλ§μΌλ‘ νμ΅
prompt = """
Classify the sentiment:
Text: "I love this product!" β Positive
Text: "Terrible experience." β Negative
Text: "This movie was great!" β
"""
result = gpt3.generate(prompt) # "Positive"
(2) CLIPμ Vision-Language μ°κ²° (2021)¶
CLIPμ μ΄λ―Έμ§μ ν μ€νΈλ₯Ό κ°μ 곡κ°μ λ§€ννμ¬ zero-shot λΆλ₯λ₯Ό κ°λ₯νκ² νμ΅λλ€:
# Zero-shot Image Classification with CLIP
import clip
model, preprocess = clip.load("ViT-B/32")
# μ΄λ―Έμ§μ ν
μ€νΈλ₯Ό κ°μ 곡κ°μ μλ² λ©
image_features = model.encode_image(preprocess(image))
text_features = model.encode_text(clip.tokenize(["a dog", "a cat", "a bird"]))
# μ μ¬λλ‘ λΆλ₯ (νμ΅ μμ΄!)
similarity = (image_features @ text_features.T).softmax(dim=-1)
# [0.95, 0.03, 0.02] β "a dog"
(3) ChatGPTμ RLHF (2022)¶
ChatGPTλ RLHF(Reinforcement Learning from Human Feedback)λ‘ μ¬λκ³Ό μ λ ¬λ μλ΅μ μμ±:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ChatGPT νμ΅ κ³Όμ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Step 1: Pre-training (GPT-3.5 base) β
β μΉ ν
μ€νΈλ‘ λ€μ ν ν° μμΈ‘ νμ΅ β
β β β
β βΌ β
β Step 2: Supervised Fine-tuning (SFT) β
β μ¬λμ΄ μμ±ν μ’μ μλ΅μΌλ‘ νμ΅ β
β β β
β βΌ β
β Step 3: Reward Model Training β
β μλ΅ μμ μ νΈλλ₯Ό μμΈ‘νλ λͺ¨λΈ νμ΅ β
β β β
β βΌ β
β Step 4: RLHF with PPO β
β Reward Modelμ 보μμΌλ‘ μ μ±
μ΅μ ν β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3. In-context Learning (ICL)¶
3.1 κ°λ ¶
In-context Learningμ λͺ¨λΈ κ°μ€μΉλ₯Ό μ λ°μ΄νΈνμ§ μκ³ ν둬ννΈ λ΄ μμλ§μΌλ‘ νμ€ν¬λ₯Ό μννλ λ₯λ ₯μ λλ€.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β In-context Learning μ’
λ₯ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Zero-shot: "Translate to French: Hello" β
β β "Bonjour" β
β β
β One-shot: "English: Hello β French: Bonjour β
β English: Goodbye β" β
β β "Au revoir" β
β β
β Few-shot: "English: Hello β French: Bonjour β
β English: Goodbye β French: Au revoir β
β English: Thank you β French: Merci β
β English: Good morning β" β
β β "Bonjour" (λλ "Bon matin") β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.2 ICLμ΄ μλνλ μ΄μ (κ°μ€λ€)¶
"""
κ°μ€ 1: Bayesian Inference
- ν둬ννΈ μμλ‘λΆν° νμ€ν¬ λΆν¬λ₯Ό μΆλ‘
- P(output | input, examples) β P(examples | task) Γ P(task)
κ°μ€ 2: Implicit Gradient Descent
- Transformerμ attentionμ΄ μ묡μ μΌλ‘ gradient stepμ μν
- λ©ν νμ΅κ³Ό μ μ¬ν λ©μ»€λμ¦
κ°μ€ 3: Task Vector Retrieval
- μ¬μ νμ΅ μ€ νμ΅ν νμ€ν¬ 벑ν°λ₯Ό κ²μ
- ν둬ννΈκ° μ μ ν νμ€ν¬ 벑ν°λ₯Ό νμ±ν
"""
3.3 Few-shot ν둬ννΈ μμ¶
# κ°μ λΆμ Few-shot
sentiment_prompt = """
Analyze the sentiment of the following reviews:
Review: "The food was delicious and the service was excellent!"
Sentiment: Positive
Review: "I waited for an hour and the waiter was rude."
Sentiment: Negative
Review: "It was okay, nothing special but not bad either."
Sentiment: Neutral
Review: "Best experience ever! Will definitely come back!"
Sentiment:"""
# API νΈμΆ
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": sentiment_prompt}]
)
print(response.choices[0].message.content) # "Positive"
4. Emergent Capabilities (μ°½λ°μ λ₯λ ₯)¶
4.1 μ μ¶
Emergent Capabilitiesλ μμ λͺ¨λΈμμλ μλ€κ° νΉμ κ·λͺ¨ μ΄μμμ κ°μκΈ° λνλλ λ₯λ ₯μ λλ€.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β μ°½λ°μ λ₯λ ₯μ νΉμ§ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Performance β
β β β
β 100%ββββββββββββββββββββββββββββββββββββββββββββββββ λν λͺ¨λΈ β
β β β± β
β β β± β
β β β± β
β 50%ββ Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β·β±Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· Β· β
β β β± β Phase Transition β
β β β± (κ°μμ€λ¬μ΄ μ±λ₯ ν₯μ) β
β βββββββββββββββββββββββββββββββββββββββββββββββ μν λͺ¨λΈ β
β 0%βββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬βββββββΆ β
β β 10B 50B 100B 200B 500B Parameters β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.2 λνμ μΈ μ°½λ°μ λ₯λ ₯¶
| λ₯λ ₯ | μ€λͺ | μΆν κ·λͺ¨ (λλ΅) |
|---|---|---|
| Arithmetic | λ€μ리 λ§μ /κ³±μ | ~10B params |
| Chain-of-Thought | λ¨κ³μ μΆλ‘ | ~60B params |
| Word Unscrambling | μμΈ λ¨μ΄ 볡μ | ~60B params |
| Multi-step Math | 볡μ‘ν μν λ¬Έμ | ~100B params |
| Code Generation | 볡μ‘ν μ½λ μμ± | ~100B params |
4.3 Chain-of-Thought (CoT) Prompting¶
# Without CoT - μ’
μ’
μ€ν¨
prompt_direct = """
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he have now?
A:"""
# GPT-3 (small): "8" (νλ¦Ό)
# With CoT - λ¨κ³μ μΆλ‘ μΌλ‘ μ νλ ν₯μ
prompt_cot = """
Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he have now?
A: Let's think step by step.
Roger started with 5 tennis balls.
He bought 2 cans, each with 3 balls, so 2 Γ 3 = 6 balls.
Total: 5 + 6 = 11 tennis balls.
The answer is 11.
Q: The cafeteria had 23 apples. If they used 20 to make lunch and
bought 6 more, how many apples do they have?
A: Let's think step by step."""
# GPT-3: "They started with 23, used 20, so 23-20=3.
# Then bought 6 more: 3+6=9. The answer is 9." (μ λ΅)
5. Foundation Modelμ ν΅μ¬ κ΅¬μ± μμ¶
5.1 μν€ν μ² λΉκ΅¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β μ£Όμ μν€ν
μ² ν¨ν΄ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Encoder-only (BERT, DINOv2) β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β [CLS] Token1 Token2 ... TokenN [SEP] β β
β β β β β β β β
β β ββββ΄βββββββ΄βββββββ΄βββββ΄βββ β β
β β β Bidirectional Attn β β β
β β ββββ¬βββββββ¬βββββββ¬βββββ¬βββ β β
β β β β β β β β
β β Pooled / Token Representations β β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β’ μλ°©ν₯ λ¬Έλ§₯ νμ© β
β β’ λΆλ₯, μλ² λ©μ μ ν© β
β β
β Decoder-only (GPT, LLaMA) β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β Token1 β Token2 β Token3 β ... β β
β β β β β β β
β β βββ΄βββββββββ΄βββββββββ΄ββ β β
β β β Causal (Masked) Attnβ β β
β β βββ¬βββββββββ¬βββββββββ¬ββ β β
β β β β β β β
β β Next Next Next β β
β β Token Token Token β β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β’ μκΈ°νκ·μ μμ± β
β β’ ν
μ€νΈ μμ±μ μ΅μ ν β
β β
β Encoder-Decoder (T5, BART) β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β ββββββββββββ ββββββββββββ β β
β β β Encoder βββββΆβ Decoder β β β
β β β(Bi-dir) β β(Causal) β β β
β β ββββββββββββ ββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β’ μ
λ ₯ μ΄ν΄ + μΆλ ₯ μμ± λΆλ¦¬ β
β β’ λ²μ, μμ½μ μ ν© β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.2 μ£Όμ κ΅¬μ± μμ¶
"""
Foundation Modelμ ν΅μ¬ κ΅¬μ± μμ:
1. Self-Attention
- Query, Key, Value μ°μ°
- λͺ¨λ μμΉ κ° κ΄κ³ νμ΅
2. Feed-Forward Network (FFN)
- μ§μ μ μ₯μ μν
- νλΌλ―Έν°μ λλΆλΆμ μ°¨μ§
3. Positional Encoding
- μμ μ 보 μ£Όμ
- Sinusoidal, Learnable, RoPE λ±
4. Normalization
- LayerNorm (BERT, GPT)
- RMSNorm (LLaMA) - λ ν¨μ¨μ
5. Activation Function
- GELU (BERT, GPT)
- SwiGLU (LLaMA) - λ λμ μ±λ₯
"""
6. Foundation Model μ¬μ©νκΈ°¶
6.1 HuggingFaceλ‘ μμνκΈ°¶
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# λͺ¨λΈ λ‘λ (μ: LLaMA-2-7B)
model_name = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16, # λ©λͺ¨λ¦¬ μ μ½
device_map="auto" # μλ GPU ν λΉ
)
# ν
μ€νΈ μμ±
prompt = "The future of AI is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
6.2 Vision Foundation Model μ¬μ©¶
# DINOv2 - λ²μ© μ΄λ―Έμ§ νΉμ§ μΆμΆ
import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModel
processor = AutoImageProcessor.from_pretrained("facebook/dinov2-base")
model = AutoModel.from_pretrained("facebook/dinov2-base")
# μ΄λ―Έμ§ μλ² λ© μΆμΆ
image = Image.open("image.jpg")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
features = outputs.last_hidden_state # (1, num_patches+1, 768)
cls_embedding = features[:, 0] # CLS ν ν° (μ 체 μ΄λ―Έμ§ νν)
# μ΄ μλ² λ©μ λΆλ₯, κ²μ, μΈκ·Έλ©ν
μ΄μ
λ±μ νμ©
6.3 API μ¬μ© (OpenAI)¶
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain foundation models in simple terms."}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
7. Foundation Modelμ νκ³μ λμ ¶
7.1 νμ¬ νκ³¶
| νκ³ | μ€λͺ | ν΄κ²° μλ |
|---|---|---|
| Hallucination | μ¬μ€μ΄ μλ μ 보 μμ± | RAG, Grounding |
| Outdated Knowledge | νμ΅ μ΄ν μ 보 λͺ¨λ¦ | RAG, Fine-tuning |
| Reasoning Limits | 볡μ‘ν λ Όλ¦¬ μΆλ‘ μ΄λ €μ | CoT, Self-consistency |
| High Compute Cost | νμ΅/μΆλ‘ λΉμ© λ§λ | Quantization, Distillation |
| Safety/Alignment | μ ν΄ μ½ν μΈ μμ± κ°λ₯ | RLHF, Constitutional AI |
7.2 μ°κ΅¬ λ°©ν₯¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β λ―Έλ μ°κ΅¬ λ°©ν₯ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Efficient Models β
β ββ Mixture of Experts, Sparse Attention, Quantization β
β β
β 2. Multimodal Integration β
β ββ Vision + Language + Audio + Code ν΅ν© β
β β
β 3. Reasoning Enhancement β
β ββ Test-time Compute (o1), Tree of Thoughts β
β β
β 4. Continual Learning β
β ββ μ§μμ νμ΅, Catastrophic Forgetting ν΄κ²° β
β β
β 5. Safety & Alignment β
β ββ Constitutional AI, Red-teaming, Interpretability β
β β
β 6. Agentic Systems β
β ββ Tool Use, Multi-Agent, Autonomous Planning β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
μ 리¶
ν΅μ¬ κ°λ ¶
- Foundation Model: λκ·λͺ¨ λ°μ΄ν°λ‘ νμ΅νμ¬ λ€μν νμ€ν¬μ μ μ© κ°λ₯ν λ²μ© λͺ¨λΈ
- ν¨λ¬λ€μ μ ν: Task-specific β Pre-train & Adapt
- In-context Learning: κ°μ€μΉ μ λ°μ΄νΈ μμ΄ ν둬ννΈλ‘ νμ΅
- Emergent Capabilities: κ·λͺ¨μ λ°λΌ κ°μκΈ° λνλλ λ₯λ ₯
λ€μ λ¨κ³¶
- 02_Scaling_Laws.md: λͺ¨λΈ ν¬κΈ°μ μ±λ₯μ κ΄κ³
- 03_Emergent_Abilities.md: μ°½λ°μ λ₯λ ₯ μ¬μΈ΅ λΆμ
μ°Έκ³ μλ£¶
ν΅μ¬ λ Όλ¬Έ¶
- Bommasani et al. (2021). "On the Opportunities and Risks of Foundation Models"
- Brown et al. (2020). "Language Models are Few-Shot Learners" (GPT-3)
- Radford et al. (2021). "Learning Transferable Visual Models From Natural Language Supervision" (CLIP)
- Wei et al. (2022). "Emergent Abilities of Large Language Models"