23. Advanced RAG

23. Advanced RAG

κ°œμš”

κΈ°λ³Έ RAGλ₯Ό λ„˜μ–΄ 더 μ •κ΅ν•œ 검색과 생성 μ „λž΅μ„ λ‹€λ£Ήλ‹ˆλ‹€. Agentic RAG, Multi-hop Reasoning, HyDE, RAPTOR λ“± μ΅œμ‹  기법을 ν•™μŠ΅ν•©λ‹ˆλ‹€.


1. RAG ν•œκ³„μ™€ κ³ κΈ‰ 기법

1.1 κΈ°λ³Έ RAG의 ν•œκ³„

기본 RAG 문제점:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. 단일 검색 ν•œκ³„                                       β”‚
β”‚     - λ³΅μž‘ν•œ μ§ˆλ¬Έμ— ν•œ 번의 κ²€μƒ‰μœΌλ‘œ λΆ€μ‘±                 β”‚
β”‚     - 닀단계 μΆ”λ‘  ν•„μš”                                   β”‚
β”‚                                                         β”‚
β”‚  2. 검색-질문 뢈일치                                     β”‚
β”‚     - 질문과 λ¬Έμ„œ μŠ€νƒ€μΌ 차이                            β”‚
β”‚     - Embedding μœ μ‚¬λ„μ˜ ν•œκ³„                            β”‚
β”‚                                                         β”‚
β”‚  3. μ»¨ν…μŠ€νŠΈ 길이 μ œν•œ                                   β”‚
β”‚     - κ΄€λ ¨ λ¬Έμ„œκ°€ λ§Žμ„ λ•Œ 처리 어렀움                    β”‚
β”‚     - μ€‘μš” 정보 λˆ„λ½ κ°€λŠ₯                                β”‚
β”‚                                                         β”‚
β”‚  4. μ΅œμ‹ μ„±/μ •ν™•μ„±                                        β”‚
β”‚     - 였래된 정보                                        β”‚
β”‚     - 신뒰도 검증 어렀움                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1.2 κ³ κΈ‰ RAG 기법 λΆ„λ₯˜

κ³ κΈ‰ RAG 기법:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Pre-Retrieval                                          β”‚
β”‚  β”œβ”€β”€ Query Transformation (HyDE, Query Expansion)       β”‚
β”‚  └── Query Routing                                      β”‚
β”‚                                                         β”‚
β”‚  Retrieval                                              β”‚
β”‚  β”œβ”€β”€ Hybrid Search (Dense + Sparse)                     β”‚
β”‚  β”œβ”€β”€ Multi-step Retrieval                               β”‚
β”‚  └── Hierarchical Retrieval (RAPTOR)                    β”‚
β”‚                                                         β”‚
β”‚  Post-Retrieval                                         β”‚
β”‚  β”œβ”€β”€ Reranking                                          β”‚
β”‚  β”œβ”€β”€ Context Compression                                β”‚
β”‚  └── Self-Reflection                                    β”‚
β”‚                                                         β”‚
β”‚  Generation                                             β”‚
β”‚  β”œβ”€β”€ Chain-of-Thought RAG                               β”‚
β”‚  └── Agentic RAG                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2. Query Transformation

2.1 HyDE (Hypothetical Document Embeddings)

HyDE 아이디어:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Query: "What is the capital of France?"                β”‚
β”‚                                                         β”‚
β”‚  κΈ°μ‘΄: query embedding으둜 직접 검색                    β”‚
β”‚        (질문 ↔ λ¬Έμ„œ μŠ€νƒ€μΌ 차이)                        β”‚
β”‚                                                         β”‚
β”‚  HyDE: LLM으둜 가상 λ¬Έμ„œ 생성 ν›„ 검색                   β”‚
β”‚        Query β†’ "Paris is the capital of France..."      β”‚
β”‚        β†’ 이 가상 λ¬Έμ„œμ˜ embedding으둜 검색              β”‚
β”‚        (λ¬Έμ„œ ↔ λ¬Έμ„œ μŠ€νƒ€μΌ 일치)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
from langchain.chains import HypotheticalDocumentEmbedder
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

class HyDERetriever:
    """HyDE 검색기"""

    def __init__(self, llm, embeddings, vectorstore):
        self.llm = llm
        self.embeddings = embeddings
        self.vectorstore = vectorstore

    def generate_hypothetical_document(self, query: str) -> str:
        """가상 λ¬Έμ„œ 생성"""
        prompt = f"""Write a short passage that would answer the following question.
The passage should be factual and informative.

Question: {query}

Passage:"""

        response = self.llm.invoke(prompt)
        return response

    def retrieve(self, query: str, k: int = 5) -> list:
        """HyDE 검색"""
        # 1. 가상 λ¬Έμ„œ 생성
        hypothetical_doc = self.generate_hypothetical_document(query)

        # 2. 가상 λ¬Έμ„œ μž„λ² λ”©
        doc_embedding = self.embeddings.embed_query(hypothetical_doc)

        # 3. μœ μ‚¬ λ¬Έμ„œ 검색
        results = self.vectorstore.similarity_search_by_vector(
            doc_embedding, k=k
        )

        return results


# LangChain λ‚΄μž₯ HyDE
def setup_hyde_chain():
    base_embeddings = OpenAIEmbeddings()
    llm = OpenAI(temperature=0)

    embeddings = HypotheticalDocumentEmbedder.from_llm(
        llm, base_embeddings, "web_search"
    )

    return embeddings

2.2 Query Expansion

class QueryExpander:
    """쿼리 ν™•μž₯"""

    def __init__(self, llm):
        self.llm = llm

    def expand_query(self, query: str, num_variations: int = 3) -> list:
        """쿼리λ₯Ό μ—¬λŸ¬ λ³€ν˜•μœΌλ‘œ ν™•μž₯"""
        prompt = f"""Generate {num_variations} different versions of the following question.
Each version should ask the same thing but use different words or perspectives.

Original question: {query}

Variations:
1."""

        response = self.llm.invoke(prompt)

        # νŒŒμ‹±
        variations = [query]  # 원본 포함
        for line in response.split("\n"):
            line = line.strip()
            if line and line[0].isdigit():
                # "1. question" ν˜•μ‹
                variation = line.split(".", 1)[-1].strip()
                variations.append(variation)

        return variations[:num_variations + 1]

    def retrieve_with_expansion(
        self,
        query: str,
        retriever,
        k: int = 5
    ) -> list:
        """ν™•μž₯된 쿼리둜 검색"""
        variations = self.expand_query(query)

        all_docs = []
        seen = set()

        for variation in variations:
            docs = retriever.get_relevant_documents(variation)
            for doc in docs:
                doc_id = hash(doc.page_content)
                if doc_id not in seen:
                    seen.add(doc_id)
                    all_docs.append(doc)

        # μƒμœ„ k개 λ°˜ν™˜ (RRF λ˜λŠ” 기타 λ°©λ²•μœΌλ‘œ μ •λ ¬)
        return all_docs[:k]

3. Agentic RAG

3.1 κ°œλ…

Agentic RAG:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LLM Agentκ°€ 검색 도ꡬλ₯Ό λ™μ μœΌλ‘œ μ‚¬μš©                   β”‚
β”‚                                                         β”‚
β”‚  Agent Loop:                                            β”‚
β”‚  1. 질문 뢄석                                           β”‚
β”‚  2. ν•„μš”ν•œ 정보 κ²°μ •                                     β”‚
β”‚  3. 검색 도ꡬ 호좜 (선택적, 반볡 κ°€λŠ₯)                   β”‚
β”‚  4. κ²°κ³Ό 평가                                           β”‚
β”‚  5. μΆ”κ°€ 검색 ν•„μš”? β†’ 반볡                              β”‚
β”‚  6. μ΅œμ’… λ‹΅λ³€ 생성                                       β”‚
β”‚                                                         β”‚
β”‚  vs κΈ°λ³Έ RAG:                                           β”‚
β”‚  Query β†’ Retrieve β†’ Generate (κ³ μ •λœ νŒŒμ΄ν”„λΌμΈ)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3.2 κ΅¬ν˜„

from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate

class AgenticRAG:
    """Agentic RAG μ‹œμŠ€ν…œ"""

    def __init__(self, llm, vectorstore, web_search=None):
        self.llm = llm
        self.vectorstore = vectorstore
        self.web_search = web_search

        self.tools = self._setup_tools()
        self.agent = self._create_agent()

    def _setup_tools(self) -> list:
        """도ꡬ μ„€μ •"""
        tools = [
            Tool(
                name="search_knowledge_base",
                func=self._search_kb,
                description="Search the internal knowledge base for relevant information. Use this for company-specific or domain-specific questions."
            ),
            Tool(
                name="search_web",
                func=self._search_web,
                description="Search the web for current information. Use this for recent events or general knowledge."
            ),
            Tool(
                name="lookup_specific",
                func=self._lookup_specific,
                description="Look up specific facts or definitions. Use this when you need precise information."
            )
        ]
        return tools

    def _search_kb(self, query: str) -> str:
        """지식 베이슀 검색"""
        docs = self.vectorstore.similarity_search(query, k=3)
        return "\n\n".join([doc.page_content for doc in docs])

    def _search_web(self, query: str) -> str:
        """μ›Ή 검색 (μ™ΈλΆ€ API ν•„μš”)"""
        if self.web_search:
            return self.web_search.run(query)
        return "Web search not available."

    def _lookup_specific(self, query: str) -> str:
        """νŠΉμ • 정보 쑰회"""
        docs = self.vectorstore.similarity_search(query, k=1)
        if docs:
            return docs[0].page_content
        return "No specific information found."

    def _create_agent(self):
        """ReAct Agent 생성"""
        prompt = PromptTemplate.from_template("""Answer the following question using the available tools.
Think step by step about what information you need.

Question: {input}

You have access to these tools:
{tools}

Use the following format:
Thought: What do I need to find out?
Action: tool_name
Action Input: the input to the tool
Observation: the result of the tool
... (repeat as needed)
Thought: I now have enough information
Final Answer: the final answer

Begin!

{agent_scratchpad}""")

        agent = create_react_agent(self.llm, self.tools, prompt)
        return AgentExecutor(agent=agent, tools=self.tools, verbose=True)

    def query(self, question: str) -> str:
        """질문 처리"""
        result = self.agent.invoke({"input": question})
        return result["output"]


# μ‚¬μš© μ˜ˆμ‹œ
def agentic_rag_example():
    from langchain.llms import OpenAI
    from langchain.vectorstores import Chroma

    llm = OpenAI(temperature=0)
    vectorstore = Chroma(...)  # μ„€μ • ν•„μš”

    rag = AgenticRAG(llm, vectorstore)

    # λ³΅μž‘ν•œ 질문
    answer = rag.query(
        "Compare our company's revenue growth in 2023 with the industry average"
    )
    print(answer)

4. Multi-hop Reasoning

4.1 κ°œλ…

Multi-hop Reasoning:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  질문: "λ°”μ΄λ“ μ˜ μΆœμƒμ§€μ˜ μΈκ΅¬λŠ”?"                       β”‚
β”‚                                                         β”‚
β”‚  Hop 1: "λ°”μ΄λ“ μ˜ μΆœμƒμ§€λŠ”?" β†’ "μŠ€ν¬λžœν„΄, PA"           β”‚
β”‚  Hop 2: "μŠ€ν¬λžœν„΄μ˜ μΈκ΅¬λŠ”?" β†’ "76,328λͺ…"               β”‚
β”‚                                                         β”‚
β”‚  μ΅œμ’… λ‹΅λ³€: "76,328λͺ…"                                   β”‚
β”‚                                                         β”‚
β”‚  단일 κ²€μƒ‰μœΌλ‘œλŠ” 직접 닡을 μ°ΎκΈ° 어렀움                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4.2 κ΅¬ν˜„

class MultiHopRAG:
    """Multi-hop Reasoning RAG"""

    def __init__(self, llm, retriever, max_hops: int = 3):
        self.llm = llm
        self.retriever = retriever
        self.max_hops = max_hops

    def decompose_question(self, question: str) -> list:
        """μ§ˆλ¬Έμ„ ν•˜μœ„ 질문으둜 λΆ„ν•΄"""
        prompt = f"""Break down the following complex question into simpler sub-questions.
Each sub-question should be answerable independently.

Question: {question}

Sub-questions (one per line):"""

        response = self.llm.invoke(prompt)
        sub_questions = [q.strip() for q in response.split("\n") if q.strip()]
        return sub_questions

    def answer_with_hops(self, question: str) -> dict:
        """닀단계 μΆ”λ‘ μœΌλ‘œ λ‹΅λ³€"""
        reasoning_chain = []
        context = ""

        for hop in range(self.max_hops):
            # ν˜„μž¬ μ»¨ν…μŠ€νŠΈλ‘œ λ‹€μŒ 질문 κ²°μ •
            if hop == 0:
                current_query = question
            else:
                current_query = self._generate_follow_up(
                    question, context, reasoning_chain
                )

            if current_query is None:
                break

            # 검색
            docs = self.retriever.get_relevant_documents(current_query)
            new_context = "\n".join([doc.page_content for doc in docs])

            # 쀑간 λ‹΅λ³€ 생성
            intermediate_answer = self._generate_intermediate_answer(
                current_query, new_context
            )

            reasoning_chain.append({
                "hop": hop + 1,
                "query": current_query,
                "answer": intermediate_answer
            })

            context += f"\n{intermediate_answer}"

            # μΆ©λΆ„ν•œ 정보가 μžˆλŠ”μ§€ 확인
            if self._has_enough_info(question, context):
                break

        # μ΅œμ’… λ‹΅λ³€
        final_answer = self._generate_final_answer(question, reasoning_chain)

        return {
            "question": question,
            "reasoning_chain": reasoning_chain,
            "final_answer": final_answer
        }

    def _generate_follow_up(self, original_q, context, chain) -> str:
        """후속 질문 생성"""
        chain_text = "\n".join([
            f"Q: {step['query']}\nA: {step['answer']}"
            for step in chain
        ])

        prompt = f"""Based on the original question and what we've learned so far,
what additional information do we need?

Original question: {original_q}

What we've found:
{chain_text}

If we have enough information to answer, respond with "DONE".
Otherwise, provide the next question to search for:"""

        response = self.llm.invoke(prompt)

        if "DONE" in response.upper():
            return None
        return response.strip()

    def _generate_intermediate_answer(self, query, context) -> str:
        """쀑간 λ‹΅λ³€ 생성"""
        prompt = f"""Based on the following context, answer the question briefly.

Context: {context}

Question: {query}

Answer:"""

        return self.llm.invoke(prompt)

    def _has_enough_info(self, question, context) -> bool:
        """μΆ©λΆ„ν•œ 정보가 μžˆλŠ”μ§€ 확인"""
        prompt = f"""Can you answer the following question based on this information?

Question: {question}
Information: {context}

Answer YES or NO:"""

        response = self.llm.invoke(prompt)
        return "YES" in response.upper()

    def _generate_final_answer(self, question, chain) -> str:
        """μ΅œμ’… λ‹΅λ³€ 생성"""
        chain_text = "\n".join([
            f"Step {step['hop']}: {step['query']} β†’ {step['answer']}"
            for step in chain
        ])

        prompt = f"""Based on the reasoning chain below, provide a final answer.

Question: {question}

Reasoning:
{chain_text}

Final Answer:"""

        return self.llm.invoke(prompt)

5. RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval)

5.1 κ°œλ…

RAPTOR ꡬ쑰:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Level 3 (졜고 μˆ˜μ€€ μš”μ•½)                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚
β”‚  β”‚     Abstract Summary              β”‚                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚
β”‚              ↑                                          β”‚
β”‚  Level 2 (ν΄λŸ¬μŠ€ν„° μš”μ•½)                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ Summary1 β”‚    β”‚ Summary2 β”‚    β”‚ Summary3 β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚      ↑   ↑          ↑   ↑          ↑   ↑              β”‚
β”‚  Level 1 (청크 ν΄λŸ¬μŠ€ν„°λ§)                              β”‚
β”‚  [C1][C2][C3]    [C4][C5][C6]    [C7][C8][C9]          β”‚
β”‚      ↑   ↑   ↑      ↑   ↑   ↑      ↑   ↑   ↑          β”‚
β”‚  Level 0 (원본 청크)                                    β”‚
β”‚  [Chunk1][Chunk2]...[ChunkN]                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

검색: μ—¬λŸ¬ λ ˆλ²¨μ—μ„œ λ™μ‹œμ— κ²€μƒ‰ν•˜μ—¬ λ‹€μ–‘ν•œ 좔상화 μˆ˜μ€€μ˜ 정보 νšλ“

5.2 κ΅¬ν˜„

from sklearn.cluster import KMeans
import numpy as np

class RAPTOR:
    """RAPTOR 계측적 검색"""

    def __init__(self, llm, embeddings, num_levels: int = 3):
        self.llm = llm
        self.embeddings = embeddings
        self.num_levels = num_levels
        self.tree = {}

    def build_tree(self, documents: list, cluster_size: int = 5):
        """RAPTOR 트리 ꡬ좕"""
        # Level 0: 원본 청크
        self.tree[0] = documents
        current_docs = documents

        for level in range(1, self.num_levels):
            # μž„λ² λ”© 계산
            texts = [doc.page_content for doc in current_docs]
            embeddings = self.embeddings.embed_documents(texts)

            # ν΄λŸ¬μŠ€ν„°λ§
            n_clusters = max(len(current_docs) // cluster_size, 1)
            kmeans = KMeans(n_clusters=n_clusters)
            clusters = kmeans.fit_predict(embeddings)

            # ν΄λŸ¬μŠ€ν„°λ³„ μš”μ•½
            summaries = []
            for cluster_id in range(n_clusters):
                cluster_docs = [
                    doc for doc, c in zip(current_docs, clusters)
                    if c == cluster_id
                ]
                summary = self._summarize_cluster(cluster_docs)
                summaries.append(summary)

            self.tree[level] = summaries
            current_docs = summaries

    def _summarize_cluster(self, docs: list) -> str:
        """ν΄λŸ¬μŠ€ν„° μš”μ•½"""
        combined_text = "\n\n".join([doc.page_content for doc in docs])

        prompt = f"""Summarize the following texts into a concise summary that captures the key information.

Texts:
{combined_text}

Summary:"""

        summary = self.llm.invoke(prompt)

        # Document 객체둜 λž˜ν•‘
        from langchain.schema import Document
        return Document(page_content=summary)

    def retrieve(self, query: str, k_per_level: int = 2) -> list:
        """계측적 검색"""
        all_results = []

        for level, docs in self.tree.items():
            # 각 λ ˆλ²¨μ—μ„œ 검색
            texts = [doc.page_content for doc in docs]
            query_embedding = self.embeddings.embed_query(query)
            doc_embeddings = self.embeddings.embed_documents(texts)

            # 코사인 μœ μ‚¬λ„
            similarities = np.dot(doc_embeddings, query_embedding)
            top_indices = np.argsort(similarities)[-k_per_level:]

            for idx in top_indices:
                all_results.append({
                    "level": level,
                    "document": docs[idx],
                    "score": similarities[idx]
                })

        # 점수둜 μ •λ ¬
        all_results.sort(key=lambda x: x["score"], reverse=True)
        return all_results

6. ColBERT (Contextualized Late Interaction)

6.1 κ°œλ…

ColBERT vs Dense Retrieval:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dense Retrieval (bi-encoder):                          β”‚
β”‚  Query β†’ [CLS] embedding                                β”‚
β”‚  Doc   β†’ [CLS] embedding                                β”‚
β”‚  Score = dot(query_emb, doc_emb)                        β”‚
β”‚  문제: 단일 λ²‘ν„°λ‘œ λ³΅μž‘ν•œ 의미 ν‘œν˜„ 어렀움               β”‚
β”‚                                                         β”‚
β”‚  ColBERT (late interaction):                            β”‚
β”‚  Query β†’ [q1, q2, ..., qn] (토큰별 μž„λ² λ”©)              β”‚
β”‚  Doc   β†’ [d1, d2, ..., dm] (토큰별 μž„λ² λ”©)              β”‚
β”‚  Score = Ξ£α΅’ maxβ±Ό sim(qα΅’, dβ±Ό)                           β”‚
β”‚  μž₯점: 토큰 μˆ˜μ€€ 맀칭으둜 더 μ •λ°€ν•œ 검색                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6.2 μ‚¬μš©

from colbert import Indexer, Searcher
from colbert.infra import Run, RunConfig, ColBERTConfig

class ColBERTRetriever:
    """ColBERT 검색기"""

    def __init__(self, index_name: str = "my_index"):
        self.index_name = index_name
        self.config = ColBERTConfig(
            nbits=2,
            doc_maxlen=300,
            query_maxlen=32
        )

    def build_index(self, documents: list, collection_path: str):
        """인덱슀 ꡬ좕"""
        # λ¬Έμ„œλ₯Ό 파일둜 μ €μž₯
        with open(collection_path, 'w') as f:
            for doc in documents:
                f.write(doc + "\n")

        with Run().context(RunConfig(nranks=1)):
            indexer = Indexer(
                checkpoint="colbert-ir/colbertv2.0",
                config=self.config
            )
            indexer.index(
                name=self.index_name,
                collection=collection_path
            )

    def search(self, query: str, k: int = 10) -> list:
        """검색"""
        with Run().context(RunConfig(nranks=1)):
            searcher = Searcher(index=self.index_name)
            results = searcher.search(query, k=k)

        return results


# RAGatouille (더 μ‰¬μš΄ ColBERT 래퍼)
def colbert_with_ragatouille():
    from ragatouille import RAGPretrainedModel

    rag = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

    # 인덱싱
    rag.index(
        collection=[
            "Document 1 content...",
            "Document 2 content..."
        ],
        index_name="my_index"
    )

    # 검색
    results = rag.search("my query", k=5)
    return results

7. Self-RAG (Self-Reflective RAG)

class SelfRAG:
    """Self-RAG: 자기 μ„±μ°° RAG"""

    def __init__(self, llm, retriever):
        self.llm = llm
        self.retriever = retriever

    def query(self, question: str) -> dict:
        """Self-RAG 질의"""
        # 1. 검색 ν•„μš”μ„± νŒλ‹¨
        needs_retrieval = self._assess_retrieval_need(question)

        if not needs_retrieval:
            # 검색 없이 직접 λ‹΅λ³€
            answer = self._generate_without_retrieval(question)
            return {"answer": answer, "retrieval_used": False}

        # 2. 검색
        docs = self.retriever.get_relevant_documents(question)

        # 3. κ΄€λ ¨μ„± 평가 (각 λ¬Έμ„œλ³„)
        relevant_docs = []
        for doc in docs:
            if self._is_relevant(question, doc):
                relevant_docs.append(doc)

        # 4. λ‹΅λ³€ 생성
        answer = self._generate_with_context(question, relevant_docs)

        # 5. λ‹΅λ³€ ν’ˆμ§ˆ 평가
        is_supported = self._check_support(answer, relevant_docs)
        is_useful = self._check_usefulness(question, answer)

        # 6. ν•„μš”μ‹œ μž¬μ‹œλ„
        if not is_supported or not is_useful:
            answer = self._refine_answer(question, relevant_docs, answer)

        return {
            "answer": answer,
            "retrieval_used": True,
            "relevant_docs": relevant_docs,
            "is_supported": is_supported,
            "is_useful": is_useful
        }

    def _assess_retrieval_need(self, question: str) -> bool:
        """검색 ν•„μš”μ„± 평가"""
        prompt = f"""Determine if external knowledge is needed to answer this question.

Question: {question}

Answer YES if retrieval is needed, NO if you can answer from general knowledge:"""

        response = self.llm.invoke(prompt)
        return "YES" in response.upper()

    def _is_relevant(self, question: str, doc) -> bool:
        """λ¬Έμ„œ κ΄€λ ¨μ„± 평가"""
        prompt = f"""Is this document relevant to the question?

Question: {question}
Document: {doc.page_content[:500]}

Answer RELEVANT or IRRELEVANT:"""

        response = self.llm.invoke(prompt)
        return "RELEVANT" in response.upper()

    def _check_support(self, answer: str, docs: list) -> bool:
        """닡변이 λ¬Έμ„œμ— μ˜ν•΄ λ’·λ°›μΉ¨λ˜λŠ”μ§€ 확인"""
        context = "\n".join([doc.page_content for doc in docs])

        prompt = f"""Is this answer supported by the given context?

Context: {context}
Answer: {answer}

Respond SUPPORTED or NOT_SUPPORTED:"""

        response = self.llm.invoke(prompt)
        return "SUPPORTED" in response.upper()

    def _check_usefulness(self, question: str, answer: str) -> bool:
        """λ‹΅λ³€ μœ μš©μ„± 확인"""
        prompt = f"""Does this answer actually address the question?

Question: {question}
Answer: {answer}

Respond USEFUL or NOT_USEFUL:"""

        response = self.llm.invoke(prompt)
        return "USEFUL" in response.upper()

핡심 정리

Advanced RAG 기법

1. HyDE: 가상 λ¬Έμ„œλ‘œ 검색 ν’ˆμ§ˆ ν–₯상
2. Query Expansion: λ‹€μ–‘ν•œ 쿼리둜 검색
3. Agentic RAG: LLM Agent의 동적 검색
4. Multi-hop: 닀단계 μΆ”λ‘ 
5. RAPTOR: 계측적 μš”μ•½ 트리
6. ColBERT: 토큰 μˆ˜μ€€ late interaction
7. Self-RAG: 자기 μ„±μ°° 및 검증

선택 κ°€μ΄λ“œ

λ‹¨μˆœ QA β†’ κΈ°λ³Έ RAG
λ³΅μž‘ν•œ 질문 β†’ Multi-hop + Agentic
κΈ΄ λ¬Έμ„œ β†’ RAPTOR
μ •λ°€ 검색 β†’ ColBERT
ν’ˆμ§ˆ μ€‘μš” β†’ Self-RAG

참고 자료

  1. Gao et al. (2022). "Precise Zero-Shot Dense Retrieval without Relevance Labels" (HyDE)
  2. Sarthi et al. (2024). "RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval"
  3. Khattab et al. (2020). "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction"
  4. Asai et al. (2023). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection"
to navigate between lessons