MLOps Overview
MLOps Overview¶
1. What is MLOps?¶
MLOps (Machine Learning Operations) is a practical methodology for automating and efficiently managing the development, deployment, and operation of machine learning models.
Problems with Traditional ML Projects:
- Models only work in notebooks
- Experiments cannot be reproduced
- Difficult to detect performance degradation after deployment
- Manual retraining required
Goals of MLOps:
- Automated ML pipelines
- Reproducible experiments
- Continuous model monitoring
- Automated retraining and deployment
1.1 Core Principles of MLOps¶
"""
Core Principles of MLOps
"""
# 1. Reproducibility
# - Must be able to get identical results with the same data and code
experiment_config = {
"data_version": "v1.2.0",
"code_version": "git:abc123",
"random_seed": 42,
"hyperparameters": {"lr": 0.001, "epochs": 100}
}
# 2. Automation
# - Minimize manual work, automate with pipelines
pipeline_stages = [
"data_validation",
"feature_engineering",
"model_training",
"model_evaluation",
"model_deployment"
]
# 3. Monitoring
# - Continuously monitor model performance and data quality
monitoring_metrics = {
"model_accuracy": 0.95,
"prediction_latency_p99": "50ms",
"data_drift_score": 0.02
}
# 4. Versioning
# - Version control for data, code, and models
versions = {
"data": "s3://bucket/data/v2.0/",
"model": "models/classifier_v3.2.1",
"config": "configs/production.yaml"
}
2. DevOps vs MLOps¶
2.1 Traditional DevOps¶
Developer β Write Code β Build β Test β Deploy β Monitor
β β
βββββββββββ Feedback βββββ
2.2 Additional Elements in MLOps¶
Data Scientist β Data β Feature Engineering β Model Training β Validation β Deploy β Monitor
β β
β β Detect Data Drift ββββββββββββββββββββββββ
β β Detect Model Performance Degradation ββββββ
ββββββββββββββββ Retrain Trigger βββββββββββββββββββββββββ
2.3 Key Differences¶
"""
DevOps vs MLOps Differences
"""
# DevOps: Code-centric
devops_artifacts = {
"source": "application_code",
"build": "docker_image",
"test": "unit_tests, integration_tests",
"deploy": "kubernetes_manifests"
}
# MLOps: Data + Code + Model
mlops_artifacts = {
"data": "training_data, validation_data",
"features": "feature_definitions, feature_store",
"code": "training_scripts, serving_code",
"model": "model_weights, model_metadata",
"experiments": "hyperparameters, metrics, artifacts"
}
# MLOps-specific Challenges
mlops_challenges = [
"Data quality management",
"Feature consistency maintenance",
"Model version control",
"A/B testing",
"Drift detection",
"Model interpretability"
]
| Category | DevOps | MLOps |
|---|---|---|
| Main Artifacts | Application code | Data + Model + Code |
| Testing | Unit/Integration tests | + Data validation, Model validation |
| Deployment Unit | Container/Service | Model + Serving infrastructure |
| Monitoring | System metrics | + Model performance, Data drift |
| Rollback | Deploy previous version | Model version rollback + Data considerations |
3. MLOps Maturity Levels¶
3.1 Google's MLOps Maturity Model¶
Level 0: Manual Process
βββ Experimentation in Jupyter notebooks
βββ Manual model deployment
βββ No monitoring
βββ No retraining triggers
Level 1: ML Pipeline Automation
βββ Automated training pipeline
βββ Experiment tracking
βββ Model registry
βββ Basic monitoring
Level 2: CI/CD Pipeline
βββ Automated testing (code + data + model)
βββ Continuous Training (CT)
βββ Automated redeployment
βββ Complete monitoring and alerting
3.2 Characteristics by Maturity Level¶
"""
Implementation by MLOps Maturity Level
"""
# Level 0: Manual ML
class Level0_ManualML:
"""
- Data scientists experiment in notebooks
- Engineers manually deploy models
- Infrequent deployment (quarterly/annual)
"""
def train(self):
# Run in notebook
model = train_model(data)
model.save("model.pkl")
# Manually copy to server
# Level 1: ML Pipeline
class Level1_Pipeline:
"""
- Automated training pipeline
- Experiment tracking (MLflow)
- Model registry
"""
def train_pipeline(self):
with mlflow.start_run():
data = load_data()
model = train_model(data)
mlflow.log_metrics(evaluate(model))
mlflow.sklearn.log_model(model, "model")
# Level 2: CI/CD/CT
class Level2_CICDCT:
"""
- Automated testing on code changes
- Automated retraining on data changes
- Automated rollback on performance degradation
"""
def continuous_training(self):
if detect_drift() or new_data_available():
trigger_training_pipeline()
if model_performance_degraded():
rollback_to_previous_version()
3.3 Maturity Checklist¶
# mlops_maturity_checklist.yaml
level_0:
- Notebook-based experimentation
- Manual model deployment
- No documentation
level_1:
- Automated training pipeline
- Experiment tracking system (MLflow/W&B)
- Model version control
- Basic monitoring
level_2:
- CI/CD for ML
- Automated testing (data/model)
- Continuous Training
- Drift detection and alerting
- A/B testing infrastructure
- Feature store
4. MLOps Tool Ecosystem¶
4.1 Tools by Category¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MLOps Tool Ecosystem β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β [Experiment Tracking] [Pipeline] [Feature Store] β
β - MLflow - Kubeflow - Feast β
β - Weights & Biases - Airflow - Tecton β
β - Neptune - Prefect - Hopsworks β
β - Comet ML - Dagster β
β β
β [Model Registry] [Serving] [Monitoring] β
β - MLflow Registry - TorchServe - Evidently β
β - Vertex AI - Triton - Grafana β
β - SageMaker - TFServing - Prometheus β
β - Neptune - BentoML - WhyLabs β
β - Seldon β
β β
β [Data Versioning] [Labeling] [Infrastructure] β
β - DVC - Label Studio - Kubernetes β
β - Delta Lake - Labelbox - Docker β
β - LakeFS - Prodigy - Terraform β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4.2 Major Tool Comparison¶
"""
MLOps Tool Comparison
"""
# Experiment tracking tools
experiment_tracking = {
"MLflow": {
"type": "Open source",
"features": ["Experiment tracking", "Model registry", "Serving"],
"deployment": "self-hosted / managed",
"best_for": "Enterprise, customization"
},
"Weights & Biases": {
"type": "Commercial (free tier available)",
"features": ["Experiment tracking", "Hyperparameter tuning", "Artifacts"],
"deployment": "SaaS / self-hosted",
"best_for": "Deep learning, collaboration, visualization"
},
"Neptune": {
"type": "Commercial (free tier available)",
"features": ["Experiment tracking", "Metadata management"],
"deployment": "SaaS",
"best_for": "Large-scale experiment management"
}
}
# Serving tools
serving_tools = {
"TorchServe": {
"supported": ["PyTorch"],
"features": ["REST/gRPC", "Batch inference", "A/B testing"],
"best_for": "PyTorch models"
},
"Triton": {
"supported": ["PyTorch", "TensorFlow", "ONNX", "TensorRT"],
"features": ["Multi-model", "GPU optimization", "Dynamic batching"],
"best_for": "High-performance inference, multi-framework"
},
"TFServing": {
"supported": ["TensorFlow"],
"features": ["REST/gRPC", "Version management"],
"best_for": "TensorFlow models"
}
}
4.3 Cloud Managed Services¶
"""
Cloud MLOps Platforms
"""
cloud_platforms = {
"AWS SageMaker": {
"components": [
"SageMaker Studio", # Development environment
"SageMaker Pipelines", # Pipeline
"Model Registry", # Model management
"SageMaker Endpoints", # Serving
"Model Monitor" # Monitoring
]
},
"Google Vertex AI": {
"components": [
"Workbench", # Development environment
"Vertex Pipelines", # Pipeline
"Model Registry", # Model management
"Prediction", # Serving
"Model Monitoring" # Monitoring
]
},
"Azure ML": {
"components": [
"Azure ML Studio", # Development environment
"Azure Pipelines", # Pipeline
"Model Registry", # Model management
"Managed Endpoints", # Serving
"Data Drift" # Monitoring
]
}
}
5. MLOps Architecture Patterns¶
5.1 Basic Architecture¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Basic MLOps Architecture β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ βββββββββββββββ βββββββββββββββ β
β β Data ββββββΆβ Feature ββββββΆβ Model β β
β β Lake β β Store β β Training β β
β βββββββββββ βββββββββββββββ ββββββββ¬βββββββ β
β β β
β βΌ β
β βββββββββββ βββββββββββββββ ββββββββββββββββ β
β β Monitor βββββββ Model βββββββ Model β β
β β /Alert β β Serving β β Registry β β
β ββββββ¬βββββ βββββββββββββββ ββββββββββββββββ β
β β β
β βββββββββββ Retrain Trigger ββββββββββββββββββΆ Training β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
5.2 Code Example¶
"""
Basic MLOps Pipeline Structure
"""
from typing import Dict, Any
class MLOpsPipeline:
"""Basic MLOps Pipeline"""
def __init__(self, config: Dict[str, Any]):
self.config = config
self.experiment_tracker = None
self.model_registry = None
self.feature_store = None
def data_ingestion(self):
"""Data collection and validation"""
raw_data = self.load_data(self.config["data_source"])
validated_data = self.validate_data(raw_data)
return validated_data
def feature_engineering(self, data):
"""Feature engineering"""
features = self.feature_store.get_features(
entity_ids=data["entity_ids"],
feature_list=self.config["features"]
)
return features
def train(self, features):
"""Model training"""
with self.experiment_tracker.start_run():
model = self.train_model(features)
metrics = self.evaluate(model)
self.experiment_tracker.log_metrics(metrics)
return model
def register(self, model):
"""Model registration"""
if self.passes_quality_gate(model):
self.model_registry.register(
model=model,
stage="staging"
)
def deploy(self, model_version: str):
"""Model deployment"""
model = self.model_registry.load(model_version)
self.serving_endpoint.update(model)
def monitor(self):
"""Model monitoring"""
if self.detect_drift():
self.trigger_retraining()
6. Getting Started¶
6.1 First MLOps Project¶
"""
Getting Started with MLOps: Experiment Tracking with MLflow
"""
import mlflow
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# MLflow setup
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("iris-classification")
# Prepare data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
# Start MLflow experiment
with mlflow.start_run(run_name="random-forest-v1"):
# Log hyperparameters
params = {"n_estimators": 100, "max_depth": 5}
mlflow.log_params(params)
# Train model
model = RandomForestClassifier(**params, random_state=42)
model.fit(X_train, y_train)
# Evaluate and log metrics
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
mlflow.log_metric("accuracy", accuracy)
# Save model
mlflow.sklearn.log_model(model, "model")
print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Accuracy: {accuracy:.4f}")
6.2 Local Environment Setup¶
# Start MLflow server
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
# Check in browser
# http://localhost:5000
Exercises¶
Exercise 1: MLOps Maturity Assessment¶
Evaluate your team's current ML process and determine the maturity level.
Exercise 2: Tool Selection¶
Choose appropriate MLOps tools for the following situations: - Small team, using PyTorch, budget constraints - Large team, multi-framework, high performance requirements
Summary¶
| Concept | Description |
|---|---|
| MLOps | Automation of ML model development, deployment, and operation |
| DevOps vs MLOps | MLOps adds data and model management |
| Maturity Level 0 | Manual process, notebook-based |
| Maturity Level 1 | Automated pipeline, experiment tracking |
| Maturity Level 2 | CI/CD/CT, continuous training |
| Core Tools | MLflow, W&B, Kubeflow, Feast, Triton |