17. GPT-4V, GPT-4o, Gemini & Claude 3

17. GPT-4V, GPT-4o, Gemini & Claude 3

๊ฐœ์š”

GPT-4V(ision), GPT-4o, Gemini, Claude 3๋Š” ํ˜„์žฌ ๊ฐ€์žฅ ๊ฐ•๋ ฅํ•œ ์ƒ์šฉ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ AI์ž…๋‹ˆ๋‹ค. ์ด ๋ ˆ์Šจ์—์„œ๋Š” ์ด๋“ค์˜ ๊ธฐ๋Šฅ, API ์‚ฌ์šฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์‹ค์ „ ์‘์šฉ ์‚ฌ๋ก€๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

2024๋…„ ์—…๋ฐ์ดํŠธ: - GPT-4o (2024.05): GPT-4์˜ "omni" ๋ฒ„์ „, ๋„ค์ดํ‹ฐ๋ธŒ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ - Gemini 1.5 Pro: 2M ํ† ํฐ ์ปจํ…์ŠคํŠธ, ๋น„๋””์˜ค/์˜ค๋””์˜ค ๋„ค์ดํ‹ฐ๋ธŒ - Claude 3 Family (2024.03): Haiku, Sonnet, Opus ๋ผ์ธ์—… - Claude 3.5 Sonnet (2024.06): ๋น„์ „ ๊ธฐ๋Šฅ ๊ฐ•ํ™”


1. GPT-4V (GPT-4 with Vision)

1.1 ๊ธฐ๋Šฅ ๊ฐœ์š”

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    GPT-4V ์ฃผ์š” ๊ธฐ๋Šฅ                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  ๐Ÿ–ผ๏ธ ์ด๋ฏธ์ง€ ์ดํ•ด                                                  โ”‚
โ”‚  - ์ƒ์„ธ ์„ค๋ช… ๋ฐ ๋ถ„์„                                            โ”‚
โ”‚  - ๋‹ค์ค‘ ์ด๋ฏธ์ง€ ๋น„๊ต                                             โ”‚
โ”‚  - ์ฐจํŠธ/๊ทธ๋ž˜ํ”„ ํ•ด์„                                             โ”‚
โ”‚                                                                  โ”‚
โ”‚  ๐Ÿ“ ํ…์ŠคํŠธ ์ธ์‹ (OCR)                                            โ”‚
โ”‚  - ์†๊ธ€์”จ ์ธ์‹                                                   โ”‚
โ”‚  - ๋‹ค๊ตญ์–ด ํ…์ŠคํŠธ                                                โ”‚
โ”‚  - ๋ฌธ์„œ ๊ตฌ์กฐ ์ดํ•ด                                               โ”‚
โ”‚                                                                  โ”‚
โ”‚  ๐Ÿ” ์„ธ๋ถ€ ๋ถ„์„                                                    โ”‚
โ”‚  - ๊ฐ์ฒด ์‹๋ณ„ ๋ฐ ์นด์šดํŒ…                                          โ”‚
โ”‚  - ๊ณต๊ฐ„ ๊ด€๊ณ„ ์ดํ•ด                                               โ”‚
โ”‚  - ์†์„ฑ ์ถ”๋ก                                                      โ”‚
โ”‚                                                                  โ”‚
โ”‚  ๐Ÿ’ก ์ถ”๋ก  ๋ฐ ์ฐฝ์ž‘                                                  โ”‚
โ”‚  - ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์ถ”๋ก                                              โ”‚
โ”‚  - ์ฝ”๋“œ ์ƒ์„ฑ (UI ์Šคํฌ๋ฆฐ์ƒท โ†’ ์ฝ”๋“œ)                               โ”‚
โ”‚  - ์ฐฝ์˜์  ๊ธ€์“ฐ๊ธฐ                                                โ”‚
โ”‚                                                                  โ”‚
โ”‚  โš ๏ธ ์ œํ•œ ์‚ฌํ•ญ                                                    โ”‚
โ”‚  - ์˜๋ฃŒ ์ง„๋‹จ ๋ถˆ๊ฐ€                                               โ”‚
โ”‚  - ์–ผ๊ตด ์ธ์‹/์‹ ์› ํ™•์ธ ๋ถˆ๊ฐ€                                     โ”‚
โ”‚  - ์‹ค์‹œ๊ฐ„ ๋น„๋””์˜ค ๋ฏธ์ง€์› (์ด๋ฏธ์ง€๋งŒ)                              โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1.2 API ์‚ฌ์šฉ๋ฒ•

from openai import OpenAI
import base64
from pathlib import Path

client = OpenAI()

def encode_image(image_path: str) -> str:
    """์ด๋ฏธ์ง€๋ฅผ base64๋กœ ์ธ์ฝ”๋”ฉ"""
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode()


def gpt4v_basic(image_path: str, prompt: str) -> str:
    """๊ธฐ๋ณธ ์ด๋ฏธ์ง€ ๋ถ„์„"""

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{encode_image(image_path)}"
                        }
                    }
                ]
            }
        ],
        max_tokens=1024
    )

    return response.choices[0].message.content


def gpt4v_multi_image(image_paths: list, prompt: str) -> str:
    """๋‹ค์ค‘ ์ด๋ฏธ์ง€ ๋ถ„์„"""

    content = [{"type": "text", "text": prompt}]

    for path in image_paths:
        content.append({
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{encode_image(path)}"}
        })

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[{"role": "user", "content": content}],
        max_tokens=2048
    )

    return response.choices[0].message.content


def gpt4v_with_detail(image_path: str, prompt: str, detail: str = "high") -> str:
    """
    ์ƒ์„ธ ์ˆ˜์ค€ ์ง€์ •

    detail:
    - "low": ๋น ๋ฅด๊ณ  ์ €๋ ด, ์ €ํ•ด์ƒ๋„ ๋ถ„์„
    - "high": ์ƒ์„ธ ๋ถ„์„, ๋” ๋งŽ์€ ํ† ํฐ ์‚ฌ์šฉ
    - "auto": ์ž๋™ ์„ ํƒ
    """

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{encode_image(image_path)}",
                            "detail": detail
                        }
                    }
                ]
            }
        ],
        max_tokens=1024
    )

    return response.choices[0].message.content


def gpt4v_url_image(image_url: str, prompt: str) -> str:
    """URL ์ด๋ฏธ์ง€ ๋ถ„์„"""

    response = client.chat.completions.create(
        model="gpt-4-vision-preview",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ]
            }
        ],
        max_tokens=1024
    )

    return response.choices[0].message.content

1.3 ์‹ค์ „ ์‘์šฉ

class GPT4VApplications:
    """GPT-4V ์‹ค์ „ ์‘์šฉ"""

    def __init__(self):
        self.client = OpenAI()

    def analyze_ui_screenshot(self, screenshot_path: str) -> dict:
        """UI ์Šคํฌ๋ฆฐ์ƒท ๋ถ„์„ ๋ฐ ์ฝ”๋“œ ์ƒ์„ฑ"""

        prompt = """Analyze this UI screenshot and:
        1. List all UI components visible
        2. Describe the layout structure
        3. Generate HTML/CSS code to recreate this UI

        Format your response as JSON with keys:
        - components: list of UI elements
        - layout: description of layout
        - html_code: HTML implementation
        - css_code: CSS styles
        """

        response = self._call_api(screenshot_path, prompt)

        # JSON ํŒŒ์‹ฑ
        import json
        try:
            return json.loads(response)
        except:
            return {"raw_response": response}

    def extract_data_from_chart(self, chart_path: str) -> dict:
        """์ฐจํŠธ์—์„œ ๋ฐ์ดํ„ฐ ์ถ”์ถœ"""

        prompt = """Analyze this chart and extract:
        1. Chart type (bar, line, pie, etc.)
        2. Title and axis labels
        3. All data points with their values
        4. Key insights or trends

        Return as structured JSON.
        """

        return self._call_api(chart_path, prompt)

    def compare_images(self, image_paths: list) -> str:
        """์ด๋ฏธ์ง€ ๋น„๊ต ๋ถ„์„"""

        prompt = """Compare these images and describe:
        1. Similarities
        2. Differences
        3. Which image is better quality and why
        4. Any notable features in each
        """

        return gpt4v_multi_image(image_paths, prompt)

    def ocr_with_structure(self, document_path: str) -> dict:
        """๊ตฌ์กฐํ™”๋œ OCR"""

        prompt = """Extract all text from this document and preserve:
        1. Headings and hierarchy
        2. Tables (as markdown)
        3. Lists (numbered and bulleted)
        4. Key-value pairs

        Return as structured markdown.
        """

        return self._call_api(document_path, prompt)

    def generate_alt_text(self, image_path: str) -> str:
        """์›น ์ ‘๊ทผ์„ฑ์„ ์œ„ํ•œ ๋Œ€์ฒด ํ…์ŠคํŠธ ์ƒ์„ฑ"""

        prompt = """Generate an appropriate alt text for this image.
        The alt text should be:
        1. Concise (under 125 characters)
        2. Descriptive of the main content
        3. Useful for screen reader users

        Just return the alt text, nothing else.
        """

        return self._call_api(image_path, prompt)

    def _call_api(self, image_path: str, prompt: str) -> str:
        response = self.client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/jpeg;base64,{encode_image(image_path)}",
                                "detail": "high"
                            }
                        }
                    ]
                }
            ],
            max_tokens=2048
        )
        return response.choices[0].message.content

2. GPT-4o (Omni)

2.1 GPT-4o ๊ฐœ์š”

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    GPT-4o vs GPT-4V ๋น„๊ต                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  GPT-4V (๊ธฐ์กด):                                                  โ”‚
โ”‚  - ํ…์ŠคํŠธ + ์ด๋ฏธ์ง€ ์ž…๋ ฅ                                          โ”‚
โ”‚  - ๋ณ„๋„์˜ ๋น„์ „ ์ธ์ฝ”๋”                                            โ”‚
โ”‚  - ๋น„๊ต์  ๋А๋ฆฐ ์‘๋‹ต                                              โ”‚
โ”‚                                                                  โ”‚
โ”‚  GPT-4o (2024.05):                                               โ”‚
โ”‚  - ํ…์ŠคํŠธ + ์ด๋ฏธ์ง€ + ์˜ค๋””์˜ค ๋„ค์ดํ‹ฐ๋ธŒ                             โ”‚
โ”‚  - ๋‹จ์ผ ๋ชจ๋ธ์—์„œ ๋ชจ๋“  ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ฒ˜๋ฆฌ                              โ”‚
โ”‚  - 2๋ฐฐ ๋น ๋ฅธ ์‘๋‹ต, 50% ์ €๋ ดํ•œ ๊ฐ€๊ฒฉ                                โ”‚
โ”‚  - ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ๋Œ€ํ™” ๊ฐ€๋Šฅ                                         โ”‚
โ”‚                                                                  โ”‚
โ”‚  ์ฃผ์š” ๊ฐœ์„ ์ :                                                    โ”‚
โ”‚  โœ… ์†๋„: ํ‰๊ท  320ms ์‘๋‹ต (GPT-4V ๋Œ€๋น„ 2๋ฐฐ)                      โ”‚
โ”‚  โœ… ๋น„์šฉ: ์ž…๋ ฅ $5/1M, ์ถœ๋ ฅ $15/1M                                โ”‚
โ”‚  โœ… ๋น„์ „: ํ–ฅ์ƒ๋œ OCR, ์ฐจํŠธ ํ•ด์„                                  โ”‚
โ”‚  โœ… ์˜ค๋””์˜ค: ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ž…์ถœ๋ ฅ                                   โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.2 GPT-4o API ์‚ฌ์šฉ๋ฒ•

from openai import OpenAI
import base64

client = OpenAI()

def gpt4o_vision(image_path: str, prompt: str) -> str:
    """GPT-4o ์ด๋ฏธ์ง€ ๋ถ„์„"""

    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode()

    response = client.chat.completions.create(
        model="gpt-4o",  # GPT-4o ์‚ฌ์šฉ
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/jpeg;base64,{image_data}",
                            "detail": "high"
                        }
                    }
                ]
            }
        ],
        max_tokens=1024
    )

    return response.choices[0].message.content


def gpt4o_audio(audio_path: str, prompt: str) -> str:
    """GPT-4o ์˜ค๋””์˜ค ๋ถ„์„ (Realtime API)"""

    # ์˜ค๋””์˜ค ํŒŒ์ผ ์ฝ๊ธฐ
    with open(audio_path, "rb") as f:
        audio_data = base64.b64encode(f.read()).decode()

    response = client.chat.completions.create(
        model="gpt-4o-audio-preview",
        modalities=["text"],
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": audio_data,
                            "format": "wav"
                        }
                    }
                ]
            }
        ]
    )

    return response.choices[0].message.content


# GPT-4o-mini: ์ €๋น„์šฉ ๋ฒ„์ „
def gpt4o_mini_vision(image_path: str, prompt: str) -> str:
    """GPT-4o-mini: ๋น ๋ฅด๊ณ  ์ €๋ ดํ•œ ๋น„์ „ ๋ชจ๋ธ"""

    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode()

    response = client.chat.completions.create(
        model="gpt-4o-mini",  # ์ €๋น„์šฉ ๋ฒ„์ „
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}
                    }
                ]
            }
        ],
        max_tokens=512
    )

    return response.choices[0].message.content

3. Google Gemini

2.1 Gemini ๋ชจ๋ธ ๋ผ์ธ์—…

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Gemini ๋ชจ๋ธ ๋น„๊ต                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  Gemini 1.5 Flash:                                              โ”‚
โ”‚  - ๋น ๋ฅธ ์‘๋‹ต, ์ €๋น„์šฉ                                            โ”‚
โ”‚  - 1M ํ† ํฐ ์ปจํ…์ŠคํŠธ                                             โ”‚
โ”‚  - ์‹ค์‹œ๊ฐ„ ์‘์šฉ์— ์ ํ•ฉ                                           โ”‚
โ”‚                                                                  โ”‚
โ”‚  Gemini 1.5 Pro:                                                โ”‚
โ”‚  - ์ตœ๊ณ  ์„ฑ๋Šฅ                                                    โ”‚
โ”‚  - 2M ํ† ํฐ ์ปจํ…์ŠคํŠธ                                             โ”‚
โ”‚  - ๋ณต์žกํ•œ ์ถ”๋ก , ์ฝ”๋“œ ์ƒ์„ฑ                                       โ”‚
โ”‚                                                                  โ”‚
โ”‚  Gemini 1.0 Ultra:                                              โ”‚
โ”‚  - ๊ฐ€์žฅ ํฐ ๋ชจ๋ธ                                                 โ”‚
โ”‚  - ๋ณต์žกํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ํƒœ์Šคํฌ                                       โ”‚
โ”‚                                                                  โ”‚
โ”‚  ํŠน๋ณ„ ๊ธฐ๋Šฅ:                                                      โ”‚
โ”‚  - ๋„ค์ดํ‹ฐ๋ธŒ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ (ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ์˜ค๋””์˜ค, ๋น„๋””์˜ค)           โ”‚
โ”‚  - ์ดˆ์žฅ๋ฌธ ์ปจํ…์ŠคํŠธ (1์‹œ๊ฐ„ ๋น„๋””์˜ค ๋ถ„์„ ๊ฐ€๋Šฅ)                     โ”‚
โ”‚  - Code execution ๋‚ด์žฅ                                          โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.2 Gemini API ์‚ฌ์šฉ๋ฒ•

import google.generativeai as genai
from PIL import Image

# API ํ‚ค ์„ค์ •
genai.configure(api_key="YOUR_API_KEY")

def gemini_basic(image_path: str, prompt: str) -> str:
    """๊ธฐ๋ณธ ์ด๋ฏธ์ง€ ๋ถ„์„"""

    model = genai.GenerativeModel('gemini-1.5-pro')

    image = Image.open(image_path)

    response = model.generate_content([prompt, image])

    return response.text


def gemini_multi_image(image_paths: list, prompt: str) -> str:
    """๋‹ค์ค‘ ์ด๋ฏธ์ง€ ๋ถ„์„"""

    model = genai.GenerativeModel('gemini-1.5-pro')

    content = [prompt]
    for path in image_paths:
        content.append(Image.open(path))

    response = model.generate_content(content)

    return response.text


def gemini_video_analysis(video_path: str, prompt: str) -> str:
    """๋น„๋””์˜ค ๋ถ„์„ (Gemini ํŠนํ™” ๊ธฐ๋Šฅ)"""

    model = genai.GenerativeModel('gemini-1.5-pro')

    # ๋น„๋””์˜ค ์—…๋กœ๋“œ
    video_file = genai.upload_file(video_path)

    # ์ฒ˜๋ฆฌ ์™„๋ฃŒ ๋Œ€๊ธฐ
    import time
    while video_file.state.name == "PROCESSING":
        time.sleep(10)
        video_file = genai.get_file(video_file.name)

    if video_file.state.name == "FAILED":
        raise ValueError("Video processing failed")

    response = model.generate_content([prompt, video_file])

    return response.text


def gemini_long_context(documents: list, query: str) -> str:
    """๊ธด ๋ฌธ์„œ ๋ถ„์„ (1M+ ํ† ํฐ)"""

    model = genai.GenerativeModel('gemini-1.5-pro')

    # ๋ชจ๋“  ๋ฌธ์„œ ๊ฒฐํ•ฉ
    content = [query]
    for doc in documents:
        if doc.endswith('.pdf'):
            content.append(genai.upload_file(doc))
        elif doc.endswith(('.jpg', '.png')):
            content.append(Image.open(doc))
        else:
            with open(doc, 'r') as f:
                content.append(f.read())

    response = model.generate_content(content)

    return response.text


def gemini_with_code_execution(prompt: str) -> dict:
    """์ฝ”๋“œ ์‹คํ–‰ ๊ธฐ๋Šฅ"""

    model = genai.GenerativeModel(
        'gemini-1.5-pro',
        tools='code_execution'
    )

    response = model.generate_content(prompt)

    # ์‹คํ–‰๋œ ์ฝ”๋“œ์™€ ๊ฒฐ๊ณผ ์ถ”์ถœ
    result = {
        'text': response.text,
        'code_execution': []
    }

    for part in response.parts:
        if hasattr(part, 'code_execution_result'):
            result['code_execution'].append({
                'code': part.text,
                'output': part.code_execution_result.output
            })

    return result

2.3 Gemini ํŠนํ™” ์‘์šฉ

class GeminiApplications:
    """Gemini ํŠนํ™” ์‘์šฉ"""

    def __init__(self):
        self.model = genai.GenerativeModel('gemini-1.5-pro')

    def analyze_long_video(
        self,
        video_path: str,
        questions: list
    ) -> dict:
        """๊ธด ๋น„๋””์˜ค ๋ถ„์„ (1์‹œ๊ฐ„+)"""

        video_file = self._upload_and_wait(video_path)

        results = {}

        for question in questions:
            prompt = f"""Analyze this video and answer: {question}

            Provide timestamps when relevant.
            """

            response = self.model.generate_content([prompt, video_file])
            results[question] = response.text

        return results

    def multimodal_reasoning(
        self,
        images: list,
        audio_path: str = None,
        text: str = None
    ) -> str:
        """๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ถ”๋ก """

        content = []

        if text:
            content.append(text)

        for img_path in images:
            content.append(Image.open(img_path))

        if audio_path:
            audio_file = self._upload_and_wait(audio_path)
            content.append(audio_file)

        response = self.model.generate_content(content)

        return response.text

    def research_assistant(
        self,
        pdf_paths: list,
        research_question: str
    ) -> dict:
        """์—ฐ๊ตฌ ๋ณด์กฐ (๊ธด ๋ฌธ์„œ ๋ถ„์„)"""

        # PDF ์—…๋กœ๋“œ
        files = [self._upload_and_wait(path) for path in pdf_paths]

        prompt = f"""You are a research assistant. Analyze these academic papers
        and answer the following research question:

        {research_question}

        Structure your response as:
        1. Summary of relevant findings from each paper
        2. Synthesis of the findings
        3. Gaps or contradictions
        4. Suggested future directions
        """

        content = [prompt] + files

        response = self.model.generate_content(content)

        return {
            'answer': response.text,
            'sources': pdf_paths
        }

    def _upload_and_wait(self, file_path: str):
        """ํŒŒ์ผ ์—…๋กœ๋“œ ๋ฐ ์ฒ˜๋ฆฌ ๋Œ€๊ธฐ"""
        import time

        file = genai.upload_file(file_path)

        while file.state.name == "PROCESSING":
            time.sleep(5)
            file = genai.get_file(file.name)

        return file

4. Anthropic Claude 3

4.1 Claude 3 ๋ชจ๋ธ ๋ผ์ธ์—…

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Claude 3 Family (2024.03)                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  Claude 3 Haiku:                                                 โ”‚
โ”‚  - ๊ฐ€์žฅ ๋น ๋ฅด๊ณ  ์ €๋ ด                                              โ”‚
โ”‚  - ์‹ค์‹œ๊ฐ„ ์‘์šฉ, ๋Œ€๋Ÿ‰ ์ฒ˜๋ฆฌ                                        โ”‚
โ”‚  - ๋น„์ „ ์ง€์›                                                     โ”‚
โ”‚                                                                  โ”‚
โ”‚  Claude 3 Sonnet:                                                โ”‚
โ”‚  - ์†๋„์™€ ์„ฑ๋Šฅ์˜ ๊ท ํ˜•                                            โ”‚
โ”‚  - ๋Œ€๋ถ€๋ถ„์˜ ๋น„์ฆˆ๋‹ˆ์Šค ์šฉ๋„์— ์ ํ•ฉ                                 โ”‚
โ”‚  - ๋น„์ „ ์ง€์›                                                     โ”‚
โ”‚                                                                  โ”‚
โ”‚  Claude 3 Opus:                                                  โ”‚
โ”‚  - ์ตœ๊ณ  ์„ฑ๋Šฅ                                                     โ”‚
โ”‚  - ๋ณต์žกํ•œ ์ถ”๋ก , ๋ถ„์„ ํƒœ์Šคํฌ                                      โ”‚
โ”‚  - ๋น„์ „ ์ง€์›                                                     โ”‚
โ”‚                                                                  โ”‚
โ”‚  Claude 3.5 Sonnet (2024.06):                                    โ”‚
โ”‚  - Opus ์ˆ˜์ค€ ์„ฑ๋Šฅ, Sonnet ๊ฐ€๊ฒฉ                                   โ”‚
โ”‚  - ํ–ฅ์ƒ๋œ ๋น„์ „, ์ฝ”๋”ฉ ๋Šฅ๋ ฅ                                        โ”‚
โ”‚  - 200K ํ† ํฐ ์ปจํ…์ŠคํŠธ                                            โ”‚
โ”‚                                                                  โ”‚
โ”‚  ํŠน์ง•:                                                            โ”‚
โ”‚  โœ… 200K ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ (์ „ ๋ชจ๋ธ)                                โ”‚
โ”‚  โœ… ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ: ์ด๋ฏธ์ง€ ์ดํ•ด                                         โ”‚
โ”‚  โœ… ์•ˆ์ „์„ฑ: Constitutional AI ์ ์šฉ                                โ”‚
โ”‚  โœ… ๋„๊ตฌ ์‚ฌ์šฉ: Function Calling ์ง€์›                              โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Claude API ์‚ฌ์šฉ๋ฒ•

import anthropic
import base64

client = anthropic.Anthropic()


def claude_vision(image_path: str, prompt: str, model: str = "claude-sonnet-4-20250514") -> str:
    """Claude ๋น„์ „ ๋ถ„์„"""

    # ์ด๋ฏธ์ง€ ์ธ์ฝ”๋”ฉ
    with open(image_path, "rb") as f:
        image_data = base64.standard_b64encode(f.read()).decode("utf-8")

    # ๋ฏธ๋””์–ด ํƒ€์ž… ๊ฒฐ์ •
    if image_path.endswith(".png"):
        media_type = "image/png"
    elif image_path.endswith(".gif"):
        media_type = "image/gif"
    elif image_path.endswith(".webp"):
        media_type = "image/webp"
    else:
        media_type = "image/jpeg"

    message = client.messages.create(
        model=model,
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": image_data,
                        },
                    },
                    {
                        "type": "text",
                        "text": prompt
                    }
                ],
            }
        ],
    )

    return message.content[0].text


def claude_multi_image(image_paths: list, prompt: str) -> str:
    """Claude ๋‹ค์ค‘ ์ด๋ฏธ์ง€ ๋ถ„์„"""

    content = []

    for path in image_paths:
        with open(path, "rb") as f:
            image_data = base64.standard_b64encode(f.read()).decode("utf-8")

        media_type = "image/png" if path.endswith(".png") else "image/jpeg"

        content.append({
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": media_type,
                "data": image_data,
            }
        })

    content.append({"type": "text", "text": prompt})

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{"role": "user", "content": content}],
    )

    return message.content[0].text


def claude_with_tools(prompt: str, image_path: str = None) -> dict:
    """Claude Tool Use (Function Calling)"""

    tools = [
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    ]

    content = [{"type": "text", "text": prompt}]

    if image_path:
        with open(image_path, "rb") as f:
            image_data = base64.standard_b64encode(f.read()).decode("utf-8")
        content.insert(0, {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/jpeg",
                "data": image_data,
            }
        })

    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=[{"role": "user", "content": content}],
    )

    return {
        "content": message.content,
        "stop_reason": message.stop_reason
    }

4.3 Claude ํŠนํ™” ๊ธฐ๋Šฅ

class ClaudeApplications:
    """Claude ํŠนํ™” ์‘์šฉ"""

    def __init__(self):
        self.client = anthropic.Anthropic()

    def long_document_analysis(self, document_text: str, query: str) -> str:
        """๊ธด ๋ฌธ์„œ ๋ถ„์„ (200K ํ† ํฐ)"""

        message = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[
                {
                    "role": "user",
                    "content": f"""๋‹ค์Œ ๋ฌธ์„œ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์งˆ๋ฌธ์— ๋‹ตํ•˜์„ธ์š”.

๋ฌธ์„œ:
{document_text}

์งˆ๋ฌธ: {query}
"""
                }
            ],
        )

        return message.content[0].text

    def code_review(self, code: str, language: str = "python") -> str:
        """์ฝ”๋“œ ๋ฆฌ๋ทฐ"""

        message = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[
                {
                    "role": "user",
                    "content": f"""๋‹ค์Œ {language} ์ฝ”๋“œ๋ฅผ ๋ฆฌ๋ทฐํ•ด์ฃผ์„ธ์š”.

```{language}
{code}

๋‹ค์Œ์„ ํฌํ•จํ•ด์ฃผ์„ธ์š”: 1. ์ž ์žฌ์  ๋ฒ„๊ทธ 2. ์„ฑ๋Šฅ ๊ฐœ์„  ์‚ฌํ•ญ 3. ์ฝ”๋“œ ์Šคํƒ€์ผ ์ œ์•ˆ 4. ๋ณด์•ˆ ๋ฌธ์ œ """ } ], )

    return message.content[0].text

def structured_output(self, image_path: str, schema: dict) -> dict:
    """๊ตฌ์กฐํ™”๋œ ์ถœ๋ ฅ ์ƒ์„ฑ"""
    import json

    with open(image_path, "rb") as f:
        image_data = base64.standard_b64encode(f.read()).decode("utf-8")

    message = self.client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": "image/jpeg",
                            "data": image_data,
                        }
                    },
                    {
                        "type": "text",
                        "text": f"""์ด ์ด๋ฏธ์ง€๋ฅผ ๋ถ„์„ํ•˜๊ณ  ๋‹ค์Œ JSON ์Šคํ‚ค๋งˆ์— ๋งž์ถฐ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•˜์„ธ์š”:

{json.dumps(schema, indent=2, ensure_ascii=False)}

JSON๋งŒ ๋ฐ˜ํ™˜ํ•˜์„ธ์š”.""" } ] } ], )

    return json.loads(message.content[0].text)
---

## 5. ๋น„๊ต ๋ฐ ์„ ํƒ ๊ฐ€์ด๋“œ

### 5.1 ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ ๋น„๊ต

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 2024 ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ ๋น„๊ต โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ โ”‚ โ”‚ ๊ธฐ๋Šฅ GPT-4o Gemini 1.5 Pro Claude 3.5 Sonnet โ”‚ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ โ”‚ ์ด๋ฏธ์ง€ ์ดํ•ด โ˜…โ˜…โ˜…โ˜…โ˜… โ˜…โ˜…โ˜…โ˜…โ˜… โ˜…โ˜…โ˜…โ˜…โ˜… โ”‚ โ”‚ ๋น„๋””์˜ค ๋ถ„์„ โœ— โ˜…โ˜…โ˜…โ˜…โ˜… (๋„ค์ดํ‹ฐ๋ธŒ) โœ— โ”‚ โ”‚ ์˜ค๋””์˜ค ๋ถ„์„ โ˜…โ˜…โ˜…โ˜…โ˜† โ˜…โ˜…โ˜…โ˜…โ˜† โœ— โ”‚ โ”‚ ์ปจํ…์ŠคํŠธ 128K 2M 200K โ”‚ โ”‚ ์ฝ”๋“œ ์‹คํ–‰ โœ— โ˜…โ˜…โ˜…โ˜…โ˜† (๋‚ด์žฅ) โœ— โ”‚ โ”‚ ์†๋„ โ˜…โ˜…โ˜…โ˜…โ˜… โ˜…โ˜…โ˜…โ˜…โ˜† (Flash) โ˜…โ˜…โ˜…โ˜…โ˜† โ”‚ โ”‚ ๊ฐ€๊ฒฉ ์ค‘๊ฐ„ ๋‚ฎ์Œ ์ค‘๊ฐ„ โ”‚ โ”‚ ์ฝ”๋”ฉ ๋Šฅ๋ ฅ โ˜…โ˜…โ˜…โ˜…โ˜† โ˜…โ˜…โ˜…โ˜…โ˜† โ˜…โ˜…โ˜…โ˜…โ˜… โ”‚ โ”‚ ์ถ”๋ก  ๋Šฅ๋ ฅ โ˜…โ˜…โ˜…โ˜…โ˜… โ˜…โ˜…โ˜…โ˜…โ˜† โ˜…โ˜…โ˜…โ˜…โ˜… โ”‚ โ”‚ โ”‚ โ”‚ ์ถ”์ฒœ ์‚ฌ์šฉ ์‚ฌ๋ก€: โ”‚ โ”‚ - GPT-4o: ์‹ค์‹œ๊ฐ„ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ, ์Œ์„ฑ ๋Œ€ํ™”, ๋น ๋ฅธ ์‘๋‹ต ํ•„์š” ์‹œ โ”‚ โ”‚ - Gemini: ๋น„๋””์˜ค ๋ถ„์„, ์ดˆ์žฅ๋ฌธ ๋ฌธ์„œ, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ณตํ•ฉ ํƒœ์Šคํฌ โ”‚ โ”‚ - Claude: ๋ณต์žกํ•œ ์ถ”๋ก , ์ฝ”๋“œ ๋ฆฌ๋ทฐ, ๊ธด ๋ฌธ์„œ ๋ถ„์„, ์•ˆ์ „์„ฑ ์ค‘์š” ์‹œ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

### 5.2 ์‚ฌ์šฉ ์‚ฌ๋ก€๋ณ„ ์„ ํƒ

```python
def select_model(use_case: str) -> str:
    """์‚ฌ์šฉ ์‚ฌ๋ก€๋ณ„ ๋ชจ๋ธ ์„ ํƒ (2024 ์—…๋ฐ์ดํŠธ)"""

    recommendations = {
        # GPT-4o๊ฐ€ ์ข‹์€ ๊ฒฝ์šฐ
        "ui_to_code": "gpt-4o",
        "realtime_chat": "gpt-4o",
        "voice_assistant": "gpt-4o-audio-preview",
        "quick_vision": "gpt-4o",

        # Gemini๊ฐ€ ์ข‹์€ ๊ฒฝ์šฐ
        "video_analysis": "gemini-1.5-pro",
        "very_long_document": "gemini-1.5-pro",  # 2M ์ปจํ…์ŠคํŠธ
        "audio_transcription": "gemini-1.5-pro",
        "multimodal_app": "gemini-1.5-pro",

        # Claude๊ฐ€ ์ข‹์€ ๊ฒฝ์šฐ
        "complex_reasoning": "claude-sonnet-4-20250514",
        "code_review": "claude-sonnet-4-20250514",
        "long_document": "claude-sonnet-4-20250514",  # 200K ์ปจํ…์ŠคํŠธ
        "safety_critical": "claude-sonnet-4-20250514",

        # ๋น„์šฉ ์ตœ์ ํ™”
        "high_volume": "gemini-1.5-flash",
        "quick_caption": "gpt-4o-mini",
        "simple_classification": "claude-3-haiku-20240307",
    }

    return recommendations.get(use_case, "gpt-4o")

6. ๋น„์šฉ ์ตœ์ ํ™”

6.1 ๋น„์šฉ ๊ณ„์‚ฐ

class CostEstimator:
    """API ๋น„์šฉ ์ถ”์ •"""

    # 2024๋…„ ๊ธฐ์ค€ ๊ฐ€๊ฒฉ (USD per 1M tokens)
    PRICING = {
        "gpt-4-vision-preview": {
            "input": 10.0,   # per 1M tokens
            "output": 30.0,  # per 1M tokens
            "image_low": 85,   # tokens
            "image_high": 765, # tokens (base) + tiles
        },
        "gpt-4o": {
            "input": 5.0,    # per 1M tokens
            "output": 15.0,  # per 1M tokens
            "image_low": 85,
            "image_high": 765,
        },
        "gpt-4o-mini": {
            "input": 0.15,   # per 1M tokens
            "output": 0.60,  # per 1M tokens
            "image_low": 85,
            "image_high": 765,
        },
        "gemini-1.5-pro": {
            "input": 1.25,   # per 1M tokens
            "output": 5.0,
            "image": 258,  # tokens per image
            "video": 263,  # tokens per second
            "audio": 32,   # tokens per second
        },
        "gemini-1.5-flash": {
            "input": 0.075,
            "output": 0.30,
        },
        "claude-3-opus": {
            "input": 15.0,   # per 1M tokens
            "output": 75.0,
        },
        "claude-sonnet-4-20250514": {
            "input": 3.0,    # per 1M tokens
            "output": 15.0,
        },
        "claude-3-haiku": {
            "input": 0.25,   # per 1M tokens
            "output": 1.25,
        },
    }

    def estimate_gpt4v_cost(
        self,
        num_images: int,
        avg_prompt_tokens: int,
        avg_response_tokens: int,
        detail: str = "high"
    ) -> float:
        """GPT-4V ๋น„์šฉ ์ถ”์ •"""

        pricing = self.PRICING["gpt-4-vision-preview"]

        # ์ด๋ฏธ์ง€ ํ† ํฐ
        if detail == "low":
            image_tokens = num_images * pricing["image_low"]
        else:
            image_tokens = num_images * pricing["image_high"]

        total_input = avg_prompt_tokens + image_tokens
        total_output = avg_response_tokens

        cost = (total_input / 1000 * pricing["input"] +
                total_output / 1000 * pricing["output"])

        return cost

    def estimate_gemini_cost(
        self,
        num_images: int = 0,
        video_seconds: int = 0,
        audio_seconds: int = 0,
        text_chars: int = 0,
        output_chars: int = 0,
        model: str = "gemini-1.5-pro"
    ) -> float:
        """Gemini ๋น„์šฉ ์ถ”์ •"""

        pricing = self.PRICING[model]

        input_cost = text_chars / 1000 * pricing["input"]
        output_cost = output_chars / 1000 * pricing["output"]

        if model == "gemini-1.5-pro":
            # ๋ฉ€ํ‹ฐ๋ฏธ๋””์–ด ๋น„์šฉ
            image_tokens = num_images * pricing["image"]
            video_tokens = video_seconds * pricing["video"]
            audio_tokens = audio_seconds * pricing["audio"]

            media_chars = (image_tokens + video_tokens + audio_tokens) * 4  # ํ† ํฐ โ†’ ๋ฌธ์ž ๊ทผ์‚ฌ
            input_cost += media_chars / 1000 * pricing["input"]

        return input_cost + output_cost


# ์‚ฌ์šฉ ์˜ˆ์‹œ
estimator = CostEstimator()

# 100๊ฐœ ์ด๋ฏธ์ง€ ๋ถ„์„ ๋น„์šฉ ๋น„๊ต
gpt4v_cost = estimator.estimate_gpt4v_cost(
    num_images=100,
    avg_prompt_tokens=100,
    avg_response_tokens=500,
    detail="high"
)

gemini_cost = estimator.estimate_gemini_cost(
    num_images=100,
    text_chars=500,
    output_chars=2000,
    model="gemini-1.5-pro"
)

print(f"GPT-4V cost: ${gpt4v_cost:.2f}")
print(f"Gemini Pro cost: ${gemini_cost:.2f}")

์ฐธ๊ณ  ์ž๋ฃŒ

๊ณต์‹ ๋ฌธ์„œ

๋ฒค์น˜๋งˆํฌ

๊ด€๋ จ ๋ ˆ์Šจ

to navigate between lessons