MLflow 기초

MLflow 기초

1. MLflow κ°œμš”

MLflowλŠ” λ¨Έμ‹ λŸ¬λ‹ 라이프사이클을 κ΄€λ¦¬ν•˜κΈ° μœ„ν•œ μ˜€ν”ˆμ†ŒμŠ€ ν”Œλž«νΌμž…λ‹ˆλ‹€. μ‹€ν—˜ 좔적, λͺ¨λΈ νŒ¨ν‚€μ§•, 배포λ₯Ό ν†΅ν•©μ μœΌλ‘œ μ§€μ›ν•©λ‹ˆλ‹€.

1.1 MLflow의 4κ°€μ§€ μ»΄ν¬λ„ŒνŠΈ

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        MLflow μ»΄ν¬λ„ŒνŠΈ                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
β”‚   β”‚  Tracking   β”‚    β”‚  Projects   β”‚    β”‚   Models    β”‚            β”‚
β”‚   β”‚             β”‚    β”‚             β”‚    β”‚             β”‚            β”‚
β”‚   β”‚ - μ‹€ν—˜ 좔적  β”‚    β”‚ - μž¬ν˜„ κ°€λŠ₯ν•œβ”‚    β”‚ - λͺ¨λΈ 포맷  β”‚            β”‚
β”‚   β”‚ - λ©”νŠΈλ¦­    β”‚    β”‚   ν”„λ‘œμ νŠΈ   β”‚    β”‚ - λ‹€μ–‘ν•œ    β”‚            β”‚
β”‚   β”‚ - νŒŒλΌλ―Έν„°  β”‚    β”‚ - μ˜μ‘΄μ„± 관리│    β”‚   ν”Œλ ˆμ΄λ²„   β”‚            β”‚
β”‚   β”‚ - μ•„ν‹°νŒ©νŠΈ  β”‚    β”‚             β”‚    β”‚             β”‚            β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚
β”‚                                                                     β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚   β”‚                   Model Registry                     β”‚          β”‚
β”‚   β”‚                                                      β”‚          β”‚
β”‚   β”‚  - λͺ¨λΈ 버전 관리  - μŠ€ν…Œμ΄μ§€ μ „ν™˜  - λͺ¨λΈ μ„€λͺ…      β”‚          β”‚
β”‚   β”‚                                                      β”‚          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1.2 μ„€μΉ˜ 및 μ„€μ •

# MLflow μ„€μΉ˜
pip install mlflow

# μΆ”κ°€ μ˜μ‘΄μ„± (선택)
pip install mlflow[extras]  # λͺ¨λ“  μΆ”κ°€ κΈ°λŠ₯
pip install mlflow[sklearn]  # scikit-learn 지원
pip install mlflow[pytorch]  # PyTorch 지원

# 버전 확인
mlflow --version

2. MLflow Tracking

2.1 κΈ°λ³Έ κ°œλ…

"""
MLflow Tracking κΈ°λ³Έ κ°œλ…
"""

# 핡심 μš©μ–΄
mlflow_concepts = {
    "Experiment": "κ΄€λ ¨ μ‹€ν–‰λ“€μ˜ κ·Έλ£Ή (예: 'churn-prediction')",
    "Run": "ν•˜λ‚˜μ˜ ν•™μŠ΅ μ‹€ν–‰ (νŒŒλΌλ―Έν„°, λ©”νŠΈλ¦­, μ•„ν‹°νŒ©νŠΈ 포함)",
    "Parameters": "μž…λ ₯ μ„€μ • (learning_rate, epochs λ“±)",
    "Metrics": "좜λ ₯ κ²°κ³Ό (accuracy, loss λ“±)",
    "Artifacts": "파일 (λͺ¨λΈ, κ·Έλž˜ν”„, 데이터 λ“±)",
    "Tags": "싀행에 λŒ€ν•œ 메타데이터"
}

2.2 첫 번째 μ‹€ν—˜

"""
MLflow κΈ°λ³Έ μ‚¬μš©λ²•
"""

import mlflow
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# 데이터 μ€€λΉ„
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# μ‹€ν—˜ μ„€μ •
mlflow.set_experiment("iris-classification")

# μ‹€ν–‰ μ‹œμž‘
with mlflow.start_run(run_name="random-forest-baseline"):
    # 1. νŒŒλΌλ―Έν„° λ‘œκΉ…
    params = {
        "n_estimators": 100,
        "max_depth": 5,
        "random_state": 42
    }
    mlflow.log_params(params)

    # 2. λͺ¨λΈ ν•™μŠ΅
    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)

    # 3. 예츑
    y_pred = model.predict(X_test)

    # 4. λ©”νŠΈλ¦­ λ‘œκΉ…
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "precision": precision_score(y_test, y_pred, average='macro'),
        "recall": recall_score(y_test, y_pred, average='macro'),
        "f1": f1_score(y_test, y_pred, average='macro')
    }
    mlflow.log_metrics(metrics)

    # 5. λͺ¨λΈ λ‘œκΉ…
    mlflow.sklearn.log_model(model, "model")

    # 6. νƒœκ·Έ μΆ”κ°€
    mlflow.set_tag("model_type", "RandomForest")
    mlflow.set_tag("developer", "ML Team")

    # μ‹€ν–‰ 정보 좜λ ₯
    run = mlflow.active_run()
    print(f"Run ID: {run.info.run_id}")
    print(f"Experiment ID: {run.info.experiment_id}")
    print(f"Metrics: {metrics}")

2.3 νŒŒλΌλ―Έν„° 및 λ©”νŠΈλ¦­ λ‘œκΉ…

"""
λ‹€μ–‘ν•œ λ‘œκΉ… 방법
"""

import mlflow
import numpy as np

with mlflow.start_run():
    # 단일 νŒŒλΌλ―Έν„°
    mlflow.log_param("learning_rate", 0.001)

    # 닀쀑 νŒŒλΌλ―Έν„°
    mlflow.log_params({
        "batch_size": 32,
        "epochs": 100,
        "optimizer": "adam"
    })

    # 단일 λ©”νŠΈλ¦­
    mlflow.log_metric("accuracy", 0.95)

    # 닀쀑 λ©”νŠΈλ¦­
    mlflow.log_metrics({
        "precision": 0.93,
        "recall": 0.91,
        "f1": 0.92
    })

    # μŠ€ν…λ³„ λ©”νŠΈλ¦­ (ν•™μŠ΅ 곑선)
    for epoch in range(100):
        train_loss = 1.0 / (epoch + 1) + np.random.random() * 0.1
        val_loss = 1.0 / (epoch + 1) + np.random.random() * 0.15
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)

    # νƒœκ·Έ (검색 κ°€λŠ₯ν•œ 메타데이터)
    mlflow.set_tag("data_version", "v2.0")
    mlflow.set_tag("experiment_type", "baseline")

    # 닀쀑 νƒœκ·Έ
    mlflow.set_tags({
        "feature_set": "full",
        "preprocessing": "standardized"
    })

2.4 μ•„ν‹°νŒ©νŠΈ λ‘œκΉ…

"""
μ•„ν‹°νŒ©νŠΈ λ‘œκΉ…
"""

import mlflow
import matplotlib.pyplot as plt
import pandas as pd
import json

with mlflow.start_run():
    # 1. 파일 λ‘œκΉ…
    # 단일 파일
    with open("config.json", "w") as f:
        json.dump({"key": "value"}, f)
    mlflow.log_artifact("config.json")

    # 디렉토리 전체
    mlflow.log_artifacts("./outputs", artifact_path="results")

    # 2. κ·Έλž˜ν”„ λ‘œκΉ…
    fig, ax = plt.subplots()
    ax.plot([1, 2, 3], [1, 4, 9])
    ax.set_title("Training Curve")
    mlflow.log_figure(fig, "training_curve.png")
    plt.close()

    # 3. DataFrame λ‘œκΉ… (CSV)
    df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]})
    df.to_csv("data.csv", index=False)
    mlflow.log_artifact("data.csv")

    # 4. λ”•μ…”λ„ˆλ¦¬λ₯Ό JSON으둜
    results = {"accuracy": 0.95, "model": "RF"}
    mlflow.log_dict(results, "results.json")

    # 5. ν…μŠ€νŠΈ λ‘œκΉ…
    mlflow.log_text("This is a log message", "log.txt")

3. MLflow UI

3.1 μ„œλ²„ μ‹œμž‘

# 둜컬 μ„œλ²„ μ‹œμž‘ (κΈ°λ³Έ)
mlflow ui

# 포트 μ§€μ •
mlflow ui --port 5000

# 호슀트 μ§€μ • (μ™ΈλΆ€ 접속 ν—ˆμš©)
mlflow ui --host 0.0.0.0 --port 5000

# λ°±μ—”λ“œ μ €μž₯μ†Œ μ§€μ •
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5000

3.2 Tracking URI μ„€μ •

"""
Tracking URI μ„€μ • 방법
"""

import mlflow

# 방법 1: μ½”λ“œμ—μ„œ μ„€μ •
mlflow.set_tracking_uri("http://localhost:5000")

# 방법 2: ν™˜κ²½ λ³€μˆ˜
# export MLFLOW_TRACKING_URI=http://localhost:5000

# 방법 3: 파일 기반 (κΈ°λ³Έκ°’)
mlflow.set_tracking_uri("file:///path/to/mlruns")

# ν˜„μž¬ μ„€μ • 확인
print(mlflow.get_tracking_uri())

3.3 UI κΈ°λŠ₯ ν™œμš©

"""
UIμ—μ„œ ν™œμš©ν•  수 μžˆλŠ” κΈ°λŠ₯λ“€
"""

# 1. μ‹€ν—˜ 비ꡐλ₯Ό μœ„ν•œ κ΅¬μ‘°ν™”λœ λ‘œκΉ…
experiments_to_compare = [
    {"n_estimators": 50, "max_depth": 3},
    {"n_estimators": 100, "max_depth": 5},
    {"n_estimators": 200, "max_depth": 10}
]

for params in experiments_to_compare:
    with mlflow.start_run():
        mlflow.log_params(params)
        # ν•™μŠ΅ 및 평가
        accuracy = train_and_evaluate(params)
        mlflow.log_metric("accuracy", accuracy)

# 2. 검색 κ°€λŠ₯ν•œ νƒœκ·Έ μΆ”κ°€
with mlflow.start_run():
    mlflow.set_tags({
        "model_type": "RandomForest",
        "feature_version": "v2",
        "data_split": "stratified"
    })

# 3. μ‹€ν–‰ 검색 (API)
runs = mlflow.search_runs(
    experiment_names=["iris-classification"],
    filter_string="metrics.accuracy > 0.9 and params.max_depth = '5'",
    order_by=["metrics.accuracy DESC"]
)
print(runs[["run_id", "params.n_estimators", "metrics.accuracy"]])

4. κΈ°λ³Έ μ‚¬μš© 예제

4.1 scikit-learn λͺ¨λΈ

"""
scikit-learn λͺ¨λΈ 전체 예제
"""

import mlflow
import mlflow.sklearn
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# 데이터 λ‘œλ“œ
wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(
    wine.data, wine.target, test_size=0.2, random_state=42
)

# μ‹€ν—˜ μ„€μ •
mlflow.set_experiment("wine-classification")

# ν•˜μ΄νΌνŒŒλΌλ―Έν„° κ·Έλ¦¬λ“œ
param_grid = [
    {"n_estimators": 50, "learning_rate": 0.1, "max_depth": 3},
    {"n_estimators": 100, "learning_rate": 0.05, "max_depth": 5},
    {"n_estimators": 200, "learning_rate": 0.01, "max_depth": 7}
]

for params in param_grid:
    with mlflow.start_run(run_name=f"gb-n{params['n_estimators']}"):
        # νŒŒλΌλ―Έν„° λ‘œκΉ…
        mlflow.log_params(params)
        mlflow.log_param("test_size", 0.2)

        # νŒŒμ΄ν”„λΌμΈ 생성
        pipeline = Pipeline([
            ("scaler", StandardScaler()),
            ("classifier", GradientBoostingClassifier(**params, random_state=42))
        ])

        # ꡐ차 검증
        cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5)
        mlflow.log_metric("cv_mean", cv_scores.mean())
        mlflow.log_metric("cv_std", cv_scores.std())

        # μ΅œμ’… ν•™μŠ΅
        pipeline.fit(X_train, y_train)
        y_pred = pipeline.predict(X_test)

        # λ©”νŠΈλ¦­ λ‘œκΉ…
        from sklearn.metrics import accuracy_score, precision_score, recall_score
        mlflow.log_metrics({
            "test_accuracy": accuracy_score(y_test, y_pred),
            "test_precision": precision_score(y_test, y_pred, average='macro'),
            "test_recall": recall_score(y_test, y_pred, average='macro')
        })

        # Confusion Matrix μ‹œκ°ν™”
        fig, ax = plt.subplots(figsize=(8, 6))
        cm = confusion_matrix(y_test, y_pred)
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax)
        ax.set_xlabel('Predicted')
        ax.set_ylabel('Actual')
        ax.set_title('Confusion Matrix')
        mlflow.log_figure(fig, "confusion_matrix.png")
        plt.close()

        # Feature Importance
        classifier = pipeline.named_steps['classifier']
        fig, ax = plt.subplots(figsize=(10, 6))
        importance = classifier.feature_importances_
        indices = np.argsort(importance)[::-1]
        ax.barh(range(len(importance)), importance[indices])
        ax.set_yticks(range(len(importance)))
        ax.set_yticklabels([wine.feature_names[i] for i in indices])
        ax.set_title('Feature Importance')
        mlflow.log_figure(fig, "feature_importance.png")
        plt.close()

        # λͺ¨λΈ λ‘œκΉ…
        mlflow.sklearn.log_model(pipeline, "model")

        print(f"Params: {params}")
        print(f"Test Accuracy: {accuracy_score(y_test, y_pred):.4f}")

4.2 PyTorch λͺ¨λΈ

"""
PyTorch λͺ¨λΈ MLflow 좔적
"""

import mlflow
import mlflow.pytorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np

# 데이터 μ€€λΉ„
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train_t = torch.FloatTensor(X_train)
y_train_t = torch.LongTensor(y_train)
X_test_t = torch.FloatTensor(X_test)
y_test_t = torch.LongTensor(y_test)

train_dataset = TensorDataset(X_train_t, y_train_t)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# λͺ¨λΈ μ •μ˜
class SimpleNN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# μ‹€ν—˜ μ„€μ •
mlflow.set_experiment("pytorch-classification")

# ν•˜μ΄νΌνŒŒλΌλ―Έν„°
params = {
    "hidden_dim": 64,
    "learning_rate": 0.001,
    "epochs": 50,
    "batch_size": 32
}

with mlflow.start_run():
    mlflow.log_params(params)

    # λͺ¨λΈ μ΄ˆκΈ°ν™”
    model = SimpleNN(20, params["hidden_dim"], 2)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=params["learning_rate"])

    # ν•™μŠ΅
    model.train()
    for epoch in range(params["epochs"]):
        total_loss = 0
        for batch_X, batch_y in train_loader:
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        avg_loss = total_loss / len(train_loader)
        mlflow.log_metric("train_loss", avg_loss, step=epoch)

        # 검증 (λ§€ 10 에폭)
        if (epoch + 1) % 10 == 0:
            model.eval()
            with torch.no_grad():
                outputs = model(X_test_t)
                _, predicted = torch.max(outputs, 1)
                accuracy = (predicted == y_test_t).float().mean().item()
                mlflow.log_metric("val_accuracy", accuracy, step=epoch)
            model.train()

    # μ΅œμ’… 평가
    model.eval()
    with torch.no_grad():
        outputs = model(X_test_t)
        _, predicted = torch.max(outputs, 1)
        test_accuracy = (predicted == y_test_t).float().mean().item()

    mlflow.log_metric("test_accuracy", test_accuracy)

    # λͺ¨λΈ λ‘œκΉ…
    mlflow.pytorch.log_model(model, "model")

    print(f"Test Accuracy: {test_accuracy:.4f}")

5. Autologging

5.1 μžλ™ λ‘œκΉ… μ„€μ •

"""
MLflow Autologging
"""

import mlflow

# 전체 ν”„λ ˆμž„μ›Œν¬ μžλ™ λ‘œκΉ…
mlflow.autolog()

# νŠΉμ • ν”„λ ˆμž„μ›Œν¬λ§Œ
mlflow.sklearn.autolog()
mlflow.pytorch.autolog()
mlflow.tensorflow.autolog()
mlflow.xgboost.autolog()
mlflow.lightgbm.autolog()

# μžλ™ λ‘œκΉ… λΉ„ν™œμ„±ν™”
mlflow.autolog(disable=True)

5.2 Autologging μ˜ˆμ‹œ

"""
sklearn autologging μ˜ˆμ‹œ
"""

import mlflow
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Autologging ν™œμ„±ν™”
mlflow.sklearn.autolog(
    log_input_examples=True,      # μž…λ ₯ μ˜ˆμ‹œ λ‘œκΉ…
    log_model_signatures=True,    # λͺ¨λΈ μ‹œκ·Έλ‹ˆμ²˜ λ‘œκΉ…
    log_models=True,              # λͺ¨λΈ λ‘œκΉ…
    log_datasets=True,            # 데이터셋 정보 λ‘œκΉ…
    silent=False                  # λ‘œκΉ… λ©”μ‹œμ§€ 좜λ ₯
)

# 데이터 μ€€λΉ„
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# μ‹€ν—˜ (μžλ™μœΌλ‘œ λͺ¨λ“  것이 λ‘œκΉ…λ¨)
mlflow.set_experiment("autolog-demo")

# λͺ¨λΈ ν•™μŠ΅ (μžλ™μœΌλ‘œ run 생성 및 λ‘œκΉ…)
model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)

# μžλ™μœΌλ‘œ λ‘œκΉ…λ˜λŠ” ν•­λͺ©:
# - νŒŒλΌλ―Έν„°: n_estimators, max_depth, ...
# - λ©”νŠΈλ¦­: training_score, ...
# - μ•„ν‹°νŒ©νŠΈ: model, feature_importance, ...

6. λͺ¨λΈ λ‘œλ“œ 및 예츑

6.1 μ €μž₯된 λͺ¨λΈ λ‘œλ“œ

"""
μ €μž₯된 λͺ¨λΈ λ‘œλ“œ 방법
"""

import mlflow
import mlflow.sklearn

# 방법 1: Run ID둜 λ‘œλ“œ
model = mlflow.sklearn.load_model("runs:/RUN_ID/model")

# 방법 2: μ•„ν‹°νŒ©νŠΈ 경둜둜 λ‘œλ“œ
model = mlflow.sklearn.load_model("file:///path/to/mlruns/0/run_id/artifacts/model")

# 방법 3: Model Registryμ—μ„œ λ‘œλ“œ (λ‹€μŒ λ ˆμŠ¨μ—μ„œ μžμ„Ένžˆ)
model = mlflow.sklearn.load_model("models:/MyModel/Production")

# 방법 4: pyfunc으둜 λ‘œλ“œ (ν”„λ ˆμž„μ›Œν¬ 무관)
model = mlflow.pyfunc.load_model("runs:/RUN_ID/model")

# 예츑
predictions = model.predict(X_test)

6.2 졜근 μ‹€ν–‰ κ²°κ³Ό 쑰회

"""
μ‹€ν—˜ κ²°κ³Ό 쑰회
"""

import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient()

# μ‹€ν—˜ 쑰회
experiment = client.get_experiment_by_name("iris-classification")
print(f"Experiment ID: {experiment.experiment_id}")

# μ‹€ν–‰ 검색
runs = client.search_runs(
    experiment_ids=[experiment.experiment_id],
    filter_string="metrics.accuracy > 0.9",
    order_by=["metrics.accuracy DESC"],
    max_results=5
)

for run in runs:
    print(f"Run ID: {run.info.run_id}")
    print(f"  Accuracy: {run.data.metrics.get('accuracy')}")
    print(f"  Params: {run.data.params}")

# 졜고 μ„±λŠ₯ λͺ¨λΈ λ‘œλ“œ
best_run = runs[0]
best_model = mlflow.sklearn.load_model(f"runs:/{best_run.info.run_id}/model")

μ—°μŠ΅ 문제

문제 1: κΈ°λ³Έ μ‹€ν—˜ 좔적

Titanic 데이터셋을 μ‚¬μš©ν•˜μ—¬ 생쑴 예츑 λͺ¨λΈμ„ ν•™μŠ΅ν•˜κ³ , MLflow둜 μ‹€ν—˜μ„ μΆ”μ ν•˜μ„Έμš”.

# 힌트
import mlflow
from sklearn.datasets import fetch_openml

titanic = fetch_openml("titanic", version=1, as_frame=True)
# μ „μ²˜λ¦¬ ν›„ λͺ¨λΈ ν•™μŠ΅
# mlflow둜 νŒŒλΌλ―Έν„°, λ©”νŠΈλ¦­ λ‘œκΉ…

문제 2: ν•˜μ΄νΌνŒŒλΌλ―Έν„° 비ꡐ

μ„œλ‘œ λ‹€λ₯Έ ν•˜μ΄νΌνŒŒλΌλ―Έν„°λ‘œ 5개 μ΄μƒμ˜ μ‹€ν—˜μ„ μ‹€ν–‰ν•˜κ³ , MLflow UIμ—μ„œ λΉ„κ΅ν•˜μ„Έμš”.


μš”μ•½

κΈ°λŠ₯ λ©”μ„œλ“œ μ„€λͺ…
μ‹€ν—˜ μ„€μ • mlflow.set_experiment() μ‹€ν—˜ κ·Έλ£Ή μ§€μ •
μ‹€ν–‰ μ‹œμž‘ mlflow.start_run() μƒˆ μ‹€ν–‰ μ‹œμž‘
νŒŒλΌλ―Έν„° mlflow.log_param(s)() μž…λ ₯ νŒŒλΌλ―Έν„° λ‘œκΉ…
λ©”νŠΈλ¦­ mlflow.log_metric(s)() 좜λ ₯ λ©”νŠΈλ¦­ λ‘œκΉ…
μ•„ν‹°νŒ©νŠΈ mlflow.log_artifact(s)() 파일 λ‘œκΉ…
λͺ¨λΈ mlflow.sklearn.log_model() λͺ¨λΈ μ €μž₯
μžλ™ λ‘œκΉ… mlflow.autolog() μžλ™ 좔적 ν™œμ„±ν™”

참고 자료

to navigate between lessons