23. Advanced RAG
23. Advanced RAG¶
κ°μ¶
κΈ°λ³Έ RAGλ₯Ό λμ΄ λ μ κ΅ν κ²μκ³Ό μμ± μ λ΅μ λ€λ£Ήλλ€. Agentic RAG, Multi-hop Reasoning, HyDE, RAPTOR λ± μ΅μ κΈ°λ²μ νμ΅ν©λλ€.
1. RAG νκ³μ κ³ κΈ κΈ°λ²¶
1.1 κΈ°λ³Έ RAGμ νκ³¶
κΈ°λ³Έ RAG λ¬Έμ μ :
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. λ¨μΌ κ²μ νκ³ β
β - 볡μ‘ν μ§λ¬Έμ ν λ²μ κ²μμΌλ‘ λΆμ‘± β
β - λ€λ¨κ³ μΆλ‘ νμ β
β β
β 2. κ²μ-μ§λ¬Έ λΆμΌμΉ β
β - μ§λ¬Έκ³Ό λ¬Έμ μ€νμΌ μ°¨μ΄ β
β - Embedding μ μ¬λμ νκ³ β
β β
β 3. 컨ν
μ€νΈ κΈΈμ΄ μ ν β
β - κ΄λ ¨ λ¬Έμκ° λ§μ λ μ²λ¦¬ μ΄λ €μ β
β - μ€μ μ 보 λλ½ κ°λ₯ β
β β
β 4. μ΅μ μ±/μ νμ± β
β - μ€λλ μ 보 β
β - μ λ’°λ κ²μ¦ μ΄λ €μ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1.2 κ³ κΈ RAG κΈ°λ² λΆλ₯¶
κ³ κΈ RAG κΈ°λ²:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pre-Retrieval β
β βββ Query Transformation (HyDE, Query Expansion) β
β βββ Query Routing β
β β
β Retrieval β
β βββ Hybrid Search (Dense + Sparse) β
β βββ Multi-step Retrieval β
β βββ Hierarchical Retrieval (RAPTOR) β
β β
β Post-Retrieval β
β βββ Reranking β
β βββ Context Compression β
β βββ Self-Reflection β
β β
β Generation β
β βββ Chain-of-Thought RAG β
β βββ Agentic RAG β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Query Transformation¶
2.1 HyDE (Hypothetical Document Embeddings)¶
HyDE μμ΄λμ΄:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query: "What is the capital of France?" β
β β
β κΈ°μ‘΄: query embeddingμΌλ‘ μ§μ κ²μ β
β (μ§λ¬Έ β λ¬Έμ μ€νμΌ μ°¨μ΄) β
β β
β HyDE: LLMμΌλ‘ κ°μ λ¬Έμ μμ± ν κ²μ β
β Query β "Paris is the capital of France..." β
β β μ΄ κ°μ λ¬Έμμ embeddingμΌλ‘ κ²μ β
β (λ¬Έμ β λ¬Έμ μ€νμΌ μΌμΉ) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
from langchain.chains import HypotheticalDocumentEmbedder
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
class HyDERetriever:
"""HyDE κ²μκΈ°"""
def __init__(self, llm, embeddings, vectorstore):
self.llm = llm
self.embeddings = embeddings
self.vectorstore = vectorstore
def generate_hypothetical_document(self, query: str) -> str:
"""κ°μ λ¬Έμ μμ±"""
prompt = f"""Write a short passage that would answer the following question.
The passage should be factual and informative.
Question: {query}
Passage:"""
response = self.llm.invoke(prompt)
return response
def retrieve(self, query: str, k: int = 5) -> list:
"""HyDE κ²μ"""
# 1. κ°μ λ¬Έμ μμ±
hypothetical_doc = self.generate_hypothetical_document(query)
# 2. κ°μ λ¬Έμ μλ² λ©
doc_embedding = self.embeddings.embed_query(hypothetical_doc)
# 3. μ μ¬ λ¬Έμ κ²μ
results = self.vectorstore.similarity_search_by_vector(
doc_embedding, k=k
)
return results
# LangChain λ΄μ₯ HyDE
def setup_hyde_chain():
base_embeddings = OpenAIEmbeddings()
llm = OpenAI(temperature=0)
embeddings = HypotheticalDocumentEmbedder.from_llm(
llm, base_embeddings, "web_search"
)
return embeddings
2.2 Query Expansion¶
class QueryExpander:
"""쿼리 νμ₯"""
def __init__(self, llm):
self.llm = llm
def expand_query(self, query: str, num_variations: int = 3) -> list:
"""쿼리λ₯Ό μ¬λ¬ λ³νμΌλ‘ νμ₯"""
prompt = f"""Generate {num_variations} different versions of the following question.
Each version should ask the same thing but use different words or perspectives.
Original question: {query}
Variations:
1."""
response = self.llm.invoke(prompt)
# νμ±
variations = [query] # μλ³Έ ν¬ν¨
for line in response.split("\n"):
line = line.strip()
if line and line[0].isdigit():
# "1. question" νμ
variation = line.split(".", 1)[-1].strip()
variations.append(variation)
return variations[:num_variations + 1]
def retrieve_with_expansion(
self,
query: str,
retriever,
k: int = 5
) -> list:
"""νμ₯λ μΏΌλ¦¬λ‘ κ²μ"""
variations = self.expand_query(query)
all_docs = []
seen = set()
for variation in variations:
docs = retriever.get_relevant_documents(variation)
for doc in docs:
doc_id = hash(doc.page_content)
if doc_id not in seen:
seen.add(doc_id)
all_docs.append(doc)
# μμ kκ° λ°ν (RRF λλ κΈ°ν λ°©λ²μΌλ‘ μ λ ¬)
return all_docs[:k]
3. Agentic RAG¶
3.1 κ°λ ¶
Agentic RAG:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM Agentκ° κ²μ λꡬλ₯Ό λμ μΌλ‘ μ¬μ© β
β β
β Agent Loop: β
β 1. μ§λ¬Έ λΆμ β
β 2. νμν μ 보 κ²°μ β
β 3. κ²μ λꡬ νΈμΆ (μ νμ , λ°λ³΅ κ°λ₯) β
β 4. κ²°κ³Ό νκ° β
β 5. μΆκ° κ²μ νμ? β λ°λ³΅ β
β 6. μ΅μ’
λ΅λ³ μμ± β
β β
β vs κΈ°λ³Έ RAG: β
β Query β Retrieve β Generate (κ³ μ λ νμ΄νλΌμΈ) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
3.2 ꡬν¶
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
class AgenticRAG:
"""Agentic RAG μμ€ν
"""
def __init__(self, llm, vectorstore, web_search=None):
self.llm = llm
self.vectorstore = vectorstore
self.web_search = web_search
self.tools = self._setup_tools()
self.agent = self._create_agent()
def _setup_tools(self) -> list:
"""λꡬ μ€μ """
tools = [
Tool(
name="search_knowledge_base",
func=self._search_kb,
description="Search the internal knowledge base for relevant information. Use this for company-specific or domain-specific questions."
),
Tool(
name="search_web",
func=self._search_web,
description="Search the web for current information. Use this for recent events or general knowledge."
),
Tool(
name="lookup_specific",
func=self._lookup_specific,
description="Look up specific facts or definitions. Use this when you need precise information."
)
]
return tools
def _search_kb(self, query: str) -> str:
"""μ§μ λ² μ΄μ€ κ²μ"""
docs = self.vectorstore.similarity_search(query, k=3)
return "\n\n".join([doc.page_content for doc in docs])
def _search_web(self, query: str) -> str:
"""μΉ κ²μ (μΈλΆ API νμ)"""
if self.web_search:
return self.web_search.run(query)
return "Web search not available."
def _lookup_specific(self, query: str) -> str:
"""νΉμ μ 보 μ‘°ν"""
docs = self.vectorstore.similarity_search(query, k=1)
if docs:
return docs[0].page_content
return "No specific information found."
def _create_agent(self):
"""ReAct Agent μμ±"""
prompt = PromptTemplate.from_template("""Answer the following question using the available tools.
Think step by step about what information you need.
Question: {input}
You have access to these tools:
{tools}
Use the following format:
Thought: What do I need to find out?
Action: tool_name
Action Input: the input to the tool
Observation: the result of the tool
... (repeat as needed)
Thought: I now have enough information
Final Answer: the final answer
Begin!
{agent_scratchpad}""")
agent = create_react_agent(self.llm, self.tools, prompt)
return AgentExecutor(agent=agent, tools=self.tools, verbose=True)
def query(self, question: str) -> str:
"""μ§λ¬Έ μ²λ¦¬"""
result = self.agent.invoke({"input": question})
return result["output"]
# μ¬μ© μμ
def agentic_rag_example():
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
llm = OpenAI(temperature=0)
vectorstore = Chroma(...) # μ€μ νμ
rag = AgenticRAG(llm, vectorstore)
# 볡μ‘ν μ§λ¬Έ
answer = rag.query(
"Compare our company's revenue growth in 2023 with the industry average"
)
print(answer)
4. Multi-hop Reasoning¶
4.1 κ°λ ¶
Multi-hop Reasoning:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β μ§λ¬Έ: "λ°μ΄λ μ μΆμμ§μ μΈκ΅¬λ?" β
β β
β Hop 1: "λ°μ΄λ μ μΆμμ§λ?" β "μ€ν¬λν΄, PA" β
β Hop 2: "μ€ν¬λν΄μ μΈκ΅¬λ?" β "76,328λͺ
" β
β β
β μ΅μ’
λ΅λ³: "76,328λͺ
" β
β β
β λ¨μΌ κ²μμΌλ‘λ μ§μ λ΅μ μ°ΎκΈ° μ΄λ €μ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.2 ꡬν¶
class MultiHopRAG:
"""Multi-hop Reasoning RAG"""
def __init__(self, llm, retriever, max_hops: int = 3):
self.llm = llm
self.retriever = retriever
self.max_hops = max_hops
def decompose_question(self, question: str) -> list:
"""μ§λ¬Έμ νμ μ§λ¬ΈμΌλ‘ λΆν΄"""
prompt = f"""Break down the following complex question into simpler sub-questions.
Each sub-question should be answerable independently.
Question: {question}
Sub-questions (one per line):"""
response = self.llm.invoke(prompt)
sub_questions = [q.strip() for q in response.split("\n") if q.strip()]
return sub_questions
def answer_with_hops(self, question: str) -> dict:
"""λ€λ¨κ³ μΆλ‘ μΌλ‘ λ΅λ³"""
reasoning_chain = []
context = ""
for hop in range(self.max_hops):
# νμ¬ μ»¨ν
μ€νΈλ‘ λ€μ μ§λ¬Έ κ²°μ
if hop == 0:
current_query = question
else:
current_query = self._generate_follow_up(
question, context, reasoning_chain
)
if current_query is None:
break
# κ²μ
docs = self.retriever.get_relevant_documents(current_query)
new_context = "\n".join([doc.page_content for doc in docs])
# μ€κ° λ΅λ³ μμ±
intermediate_answer = self._generate_intermediate_answer(
current_query, new_context
)
reasoning_chain.append({
"hop": hop + 1,
"query": current_query,
"answer": intermediate_answer
})
context += f"\n{intermediate_answer}"
# μΆ©λΆν μ λ³΄κ° μλμ§ νμΈ
if self._has_enough_info(question, context):
break
# μ΅μ’
λ΅λ³
final_answer = self._generate_final_answer(question, reasoning_chain)
return {
"question": question,
"reasoning_chain": reasoning_chain,
"final_answer": final_answer
}
def _generate_follow_up(self, original_q, context, chain) -> str:
"""νμ μ§λ¬Έ μμ±"""
chain_text = "\n".join([
f"Q: {step['query']}\nA: {step['answer']}"
for step in chain
])
prompt = f"""Based on the original question and what we've learned so far,
what additional information do we need?
Original question: {original_q}
What we've found:
{chain_text}
If we have enough information to answer, respond with "DONE".
Otherwise, provide the next question to search for:"""
response = self.llm.invoke(prompt)
if "DONE" in response.upper():
return None
return response.strip()
def _generate_intermediate_answer(self, query, context) -> str:
"""μ€κ° λ΅λ³ μμ±"""
prompt = f"""Based on the following context, answer the question briefly.
Context: {context}
Question: {query}
Answer:"""
return self.llm.invoke(prompt)
def _has_enough_info(self, question, context) -> bool:
"""μΆ©λΆν μ λ³΄κ° μλμ§ νμΈ"""
prompt = f"""Can you answer the following question based on this information?
Question: {question}
Information: {context}
Answer YES or NO:"""
response = self.llm.invoke(prompt)
return "YES" in response.upper()
def _generate_final_answer(self, question, chain) -> str:
"""μ΅μ’
λ΅λ³ μμ±"""
chain_text = "\n".join([
f"Step {step['hop']}: {step['query']} β {step['answer']}"
for step in chain
])
prompt = f"""Based on the reasoning chain below, provide a final answer.
Question: {question}
Reasoning:
{chain_text}
Final Answer:"""
return self.llm.invoke(prompt)
5. RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)¶
5.1 κ°λ ¶
RAPTOR ꡬ쑰:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Level 3 (μ΅κ³ μμ€ μμ½) β
β ββββββββββββββββββββββββββββββββββββ β
β β Abstract Summary β β
β ββββββββββββββββββββββββββββββββββββ β
β β β
β Level 2 (ν΄λ¬μ€ν° μμ½) β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Summary1 β β Summary2 β β Summary3 β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β β β β β
β Level 1 (μ²ν¬ ν΄λ¬μ€ν°λ§) β
β [C1][C2][C3] [C4][C5][C6] [C7][C8][C9] β
β β β β β β β β β β β
β Level 0 (μλ³Έ μ²ν¬) β
β [Chunk1][Chunk2]...[ChunkN] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
κ²μ: μ¬λ¬ λ 벨μμ λμμ κ²μνμ¬ λ€μν μΆμν μμ€μ μ 보 νλ
5.2 ꡬν¶
from sklearn.cluster import KMeans
import numpy as np
class RAPTOR:
"""RAPTOR κ³μΈ΅μ κ²μ"""
def __init__(self, llm, embeddings, num_levels: int = 3):
self.llm = llm
self.embeddings = embeddings
self.num_levels = num_levels
self.tree = {}
def build_tree(self, documents: list, cluster_size: int = 5):
"""RAPTOR νΈλ¦¬ ꡬμΆ"""
# Level 0: μλ³Έ μ²ν¬
self.tree[0] = documents
current_docs = documents
for level in range(1, self.num_levels):
# μλ² λ© κ³μ°
texts = [doc.page_content for doc in current_docs]
embeddings = self.embeddings.embed_documents(texts)
# ν΄λ¬μ€ν°λ§
n_clusters = max(len(current_docs) // cluster_size, 1)
kmeans = KMeans(n_clusters=n_clusters)
clusters = kmeans.fit_predict(embeddings)
# ν΄λ¬μ€ν°λ³ μμ½
summaries = []
for cluster_id in range(n_clusters):
cluster_docs = [
doc for doc, c in zip(current_docs, clusters)
if c == cluster_id
]
summary = self._summarize_cluster(cluster_docs)
summaries.append(summary)
self.tree[level] = summaries
current_docs = summaries
def _summarize_cluster(self, docs: list) -> str:
"""ν΄λ¬μ€ν° μμ½"""
combined_text = "\n\n".join([doc.page_content for doc in docs])
prompt = f"""Summarize the following texts into a concise summary that captures the key information.
Texts:
{combined_text}
Summary:"""
summary = self.llm.invoke(prompt)
# Document κ°μ²΄λ‘ λν
from langchain.schema import Document
return Document(page_content=summary)
def retrieve(self, query: str, k_per_level: int = 2) -> list:
"""κ³μΈ΅μ κ²μ"""
all_results = []
for level, docs in self.tree.items():
# κ° λ 벨μμ κ²μ
texts = [doc.page_content for doc in docs]
query_embedding = self.embeddings.embed_query(query)
doc_embeddings = self.embeddings.embed_documents(texts)
# μ½μ¬μΈ μ μ¬λ
similarities = np.dot(doc_embeddings, query_embedding)
top_indices = np.argsort(similarities)[-k_per_level:]
for idx in top_indices:
all_results.append({
"level": level,
"document": docs[idx],
"score": similarities[idx]
})
# μ μλ‘ μ λ ¬
all_results.sort(key=lambda x: x["score"], reverse=True)
return all_results
6. ColBERT (Contextualized Late Interaction)¶
6.1 κ°λ ¶
ColBERT vs Dense Retrieval:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Dense Retrieval (bi-encoder): β
β Query β [CLS] embedding β
β Doc β [CLS] embedding β
β Score = dot(query_emb, doc_emb) β
β λ¬Έμ : λ¨μΌ 벑ν°λ‘ 볡μ‘ν μλ―Έ νν μ΄λ €μ β
β β
β ColBERT (late interaction): β
β Query β [q1, q2, ..., qn] (ν ν°λ³ μλ² λ©) β
β Doc β [d1, d2, ..., dm] (ν ν°λ³ μλ² λ©) β
β Score = Ξ£α΅’ maxβ±Ό sim(qα΅’, dβ±Ό) β
β μ₯μ : ν ν° μμ€ λ§€μΉμΌλ‘ λ μ λ°ν κ²μ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6.2 μ¬μ©¶
from colbert import Indexer, Searcher
from colbert.infra import Run, RunConfig, ColBERTConfig
class ColBERTRetriever:
"""ColBERT κ²μκΈ°"""
def __init__(self, index_name: str = "my_index"):
self.index_name = index_name
self.config = ColBERTConfig(
nbits=2,
doc_maxlen=300,
query_maxlen=32
)
def build_index(self, documents: list, collection_path: str):
"""μΈλ±μ€ ꡬμΆ"""
# λ¬Έμλ₯Ό νμΌλ‘ μ μ₯
with open(collection_path, 'w') as f:
for doc in documents:
f.write(doc + "\n")
with Run().context(RunConfig(nranks=1)):
indexer = Indexer(
checkpoint="colbert-ir/colbertv2.0",
config=self.config
)
indexer.index(
name=self.index_name,
collection=collection_path
)
def search(self, query: str, k: int = 10) -> list:
"""κ²μ"""
with Run().context(RunConfig(nranks=1)):
searcher = Searcher(index=self.index_name)
results = searcher.search(query, k=k)
return results
# RAGatouille (λ μ¬μ΄ ColBERT λνΌ)
def colbert_with_ragatouille():
from ragatouille import RAGPretrainedModel
rag = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
# μΈλ±μ±
rag.index(
collection=[
"Document 1 content...",
"Document 2 content..."
],
index_name="my_index"
)
# κ²μ
results = rag.search("my query", k=5)
return results
7. Self-RAG (Self-Reflective RAG)¶
class SelfRAG:
"""Self-RAG: μκΈ° μ±μ°° RAG"""
def __init__(self, llm, retriever):
self.llm = llm
self.retriever = retriever
def query(self, question: str) -> dict:
"""Self-RAG μ§μ"""
# 1. κ²μ νμμ± νλ¨
needs_retrieval = self._assess_retrieval_need(question)
if not needs_retrieval:
# κ²μ μμ΄ μ§μ λ΅λ³
answer = self._generate_without_retrieval(question)
return {"answer": answer, "retrieval_used": False}
# 2. κ²μ
docs = self.retriever.get_relevant_documents(question)
# 3. κ΄λ ¨μ± νκ° (κ° λ¬Έμλ³)
relevant_docs = []
for doc in docs:
if self._is_relevant(question, doc):
relevant_docs.append(doc)
# 4. λ΅λ³ μμ±
answer = self._generate_with_context(question, relevant_docs)
# 5. λ΅λ³ νμ§ νκ°
is_supported = self._check_support(answer, relevant_docs)
is_useful = self._check_usefulness(question, answer)
# 6. νμμ μ¬μλ
if not is_supported or not is_useful:
answer = self._refine_answer(question, relevant_docs, answer)
return {
"answer": answer,
"retrieval_used": True,
"relevant_docs": relevant_docs,
"is_supported": is_supported,
"is_useful": is_useful
}
def _assess_retrieval_need(self, question: str) -> bool:
"""κ²μ νμμ± νκ°"""
prompt = f"""Determine if external knowledge is needed to answer this question.
Question: {question}
Answer YES if retrieval is needed, NO if you can answer from general knowledge:"""
response = self.llm.invoke(prompt)
return "YES" in response.upper()
def _is_relevant(self, question: str, doc) -> bool:
"""λ¬Έμ κ΄λ ¨μ± νκ°"""
prompt = f"""Is this document relevant to the question?
Question: {question}
Document: {doc.page_content[:500]}
Answer RELEVANT or IRRELEVANT:"""
response = self.llm.invoke(prompt)
return "RELEVANT" in response.upper()
def _check_support(self, answer: str, docs: list) -> bool:
"""λ΅λ³μ΄ λ¬Έμμ μν΄ λ·λ°μΉ¨λλμ§ νμΈ"""
context = "\n".join([doc.page_content for doc in docs])
prompt = f"""Is this answer supported by the given context?
Context: {context}
Answer: {answer}
Respond SUPPORTED or NOT_SUPPORTED:"""
response = self.llm.invoke(prompt)
return "SUPPORTED" in response.upper()
def _check_usefulness(self, question: str, answer: str) -> bool:
"""λ΅λ³ μ μ©μ± νμΈ"""
prompt = f"""Does this answer actually address the question?
Question: {question}
Answer: {answer}
Respond USEFUL or NOT_USEFUL:"""
response = self.llm.invoke(prompt)
return "USEFUL" in response.upper()
ν΅μ¬ μ 리¶
Advanced RAG κΈ°λ²¶
1. HyDE: κ°μ λ¬Έμλ‘ κ²μ νμ§ ν₯μ
2. Query Expansion: λ€μν μΏΌλ¦¬λ‘ κ²μ
3. Agentic RAG: LLM Agentμ λμ κ²μ
4. Multi-hop: λ€λ¨κ³ μΆλ‘
5. RAPTOR: κ³μΈ΅μ μμ½ νΈλ¦¬
6. ColBERT: ν ν° μμ€ late interaction
7. Self-RAG: μκΈ° μ±μ°° λ° κ²μ¦
μ ν κ°μ΄λ¶
λ¨μ QA β κΈ°λ³Έ RAG
볡μ‘ν μ§λ¬Έ β Multi-hop + Agentic
κΈ΄ λ¬Έμ β RAPTOR
μ λ° κ²μ β ColBERT
νμ§ μ€μ β Self-RAG
μ°Έκ³ μλ£¶
- Gao et al. (2022). "Precise Zero-Shot Dense Retrieval without Relevance Labels" (HyDE)
- Sarthi et al. (2024). "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval"
- Khattab et al. (2020). "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction"
- Asai et al. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection"