MLOps: Deploy dan Maintain Model AI di Production

Sudah berhasil training model AI dengan akurasi 95%? Selamat! Tapi tahukah kamu? Training model itu cuma 20% dari pekerjaan. Sisanya? Deploy, monitor, dan maintain model di production — itulah yang disebut MLOps (Machine Learning Operations)! 🚀

Apa Itu MLOps?

MLOps adalah praktik menggabungkan Machine Learning, DevOps, dan Data Engineering untuk mengotomatisasi dan streamline deployment serta maintenance model AI di production.

Perbandingan: ML Research vs MLOps

Aspek	ML Research	MLOps/Production
Code	Jupyter notebook	Modular, tested, versioned
Data	Static dataset	Streaming, real-time
Model	Single trained model	Versioned, A/B tested
Deployment	Manual/saved file	Automated pipeline
Monitoring	Validation metrics	Real-time performance
Updates	Manual retraining	Automated retraining

ML Lifecycle: Dari Eksperimen ke Production

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Data Prep  │────▶│   Training  │────▶│   Evaluate  │
└─────────────┘     └─────────────┘     └─────────────┘
                                                │
                                                ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Monitor   │◀────│   Deploy    │◀────│    Test     │
│  & Retrain  │     │             │     │             │
└─────────────┘     └─────────────┘     └─────────────┘

Step 1: Model Packaging

Save Model

import joblib
import pickle

# Scikit-learn
joblib.dump(model, 'model.pkl')

# TensorFlow/Keras
model.save('my_model.h5')

# PyTorch
torch.save(model.state_dict(), 'model.pth')

Model Registry

Simpan model dengan versioning:

MLflow:

import mlflow

mlflow.set_experiment("sentiment-analysis")
with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.sklearn.log_model(model, "model")

Step 2: Model Deployment

Option 1: REST API (Flask/FastAPI)

FastAPI (Recommended):

from fastapi import FastAPI
from pydantic import BaseModel
import joblib

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    text: str

@app.post("/predict")
def predict(request: PredictionRequest):
    prediction = model.predict([request.text])
    return {"prediction": prediction[0]}

# Run: uvicorn main:app --host 0.0.0.0 --port 8000

Testing:

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This movie is amazing!"}'

Option 2: Docker Container

Dockerfile:

FROM python:3.9-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Build & Run:

docker build -t my-ml-model .
docker run -p 8000:8000 my-ml-model

Option 3: Cloud Deployment

AWS SageMaker:

import sagemaker
from sagemaker.sklearn import SKLearnModel

model = SKLearnModel(
    model_data='s3://my-bucket/model.tar.gz',
    role=role,
    entry_point='inference.py'
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large'
)

Google Cloud AI Platform:

gcloud ai-platform models create my_model
gcloud ai-platform versions create v1 \
  --model=my_model \
  --runtime-version=2.5 \
  --python-version=3.7 \
  --framework=scikit-learn \
  --origin=gs://my-bucket/model/

Step 3: Model Serving Patterns

Pattern 1: Online Serving (Real-time)

Use case: Chatbot, recommendation, fraud detection
Latency requirement: < 100ms
Tech: REST API, gRPC

Pattern 2: Batch Serving

Use case: Daily report, churn prediction
Latency requirement: Minutes to hours OK
Tech: Apache Spark, Airflow

Pattern 3: Edge Deployment

Use case: Mobile app, IoT devices
Constraint: Limited compute, offline capable
Tech: TensorFlow Lite, ONNX, Core ML

# TensorFlow Lite conversion
import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model('my_model')
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

Step 4: Monitoring Model di Production

Metrics yang Harus Di-Monitor

1. Model Performance Metrics

# Track prediction confidence
def log_prediction(features, prediction, confidence):
    mlflow.log_metric("confidence", confidence)
    
    if confidence < 0.7:
        send_alert("Low confidence prediction detected!")

2. Data Drift Detection

Data di production berubah dari training data?

Evidently AI:

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(
    reference_data=training_data,
    current_data=production_data
)
report.save_html("drift_report.html")

3. Concept Drift

Hubungan antara features dan target berubah?

Contoh:

Training: “iPhone” = luxury item (2010)
Production: “iPhone” = common item (2024)
Model perlu retrain dengan data baru!

Monitoring Tools

Tool	Use Case
Prometheus + Grafana	Infrastructure metrics
Evidently AI	ML-specific metrics
MLflow	Experiment tracking
Weights & Biases	Experiment tracking + visualization
WhyLabs	Data drift detection

Step 5: Retraining Pipeline

Trigger untuk Retraining

Scheduled: Retrain setiap minggu/bulan
Performance-based: Accuracy turun di bawah threshold
Data-based: Data drift terdeteksi
Manual: Data scientist trigger retraining

Automated Retraining dengan Airflow

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def check_drift():
    # Check if retraining needed
    return detect_drift()

def retrain_model():
    # Fetch new data
    # Train model
    # Validate
    # Deploy if better
    pass

with DAG('ml_retraining', start_date=datetime(2024, 1, 1)) as dag:
    check = PythonOperator(task_id='check_drift', python_callable=check_drift)
    retrain = PythonOperator(task_id='retrain', python_callable=retrain_model)
    
    check >> retrain

Step 6: A/B Testing untuk Model

Test model baru tanpa risk ke seluruh user.

# Route 10% traffic ke model baru
def get_model_version(user_id):
    if hash(user_id) % 100 < 10:  # 10%
        return "model_v2"
    return "model_v1"

# Compare metrics
# If v2 better, increase traffic gradually

Tools:

MLflow Model Registry: Manage model versions
Seldon: Advanced deployment patterns (canary, shadow)
KFServing: Kubernetes-native model serving

Step 7: CI/CD untuk ML

ML Pipeline dengan GitHub Actions

# .github/workflows/ml-pipeline.yml
name: ML Pipeline
on: [push]

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Setup Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Train model
        run: python train.py
      - name: Evaluate
        run: python evaluate.py
      - name: Deploy if good
        if: success()
        run: python deploy.py

Best Practices MLOps

✅ 1. Version Everything

Code: Git
Data: DVC (Data Version Control)
Model: MLflow, Weights & Biases
Environment: Docker, Conda

✅ 2. Reproducibility

# environment.yml
name: ml-project
channels:
  - conda-forge
dependencies:
  - python=3.9
  - scikit-learn=1.2.0
  - pandas=1.5.0

✅ 3. Testing

Unit tests untuk preprocessing
Integration tests untuk pipeline
Model performance tests

✅ 4. Documentation

Model cards (data, limitations, bias)
API documentation
Runbooks untuk on-call

✅ 5. Security

Encrypt model artifacts
Access control untuk model API
Audit logs untuk predictions

Tools MLOps Populer

End-to-End Platforms

Kubeflow: Kubernetes-native ML workflows
MLflow: Experiment tracking, model registry, deployment
Azure Machine Learning: Cloud MLOps platform
AWS SageMaker: Managed ML platform

Specialized Tools

Category	Tools
Experiment Tracking	MLflow, Weights & Biases, Neptune
Data Versioning	DVC, Pachyderm
Feature Store	Feast, Tecton
Model Serving	Seldon, KFServing, BentoML
Monitoring	Evidently, WhyLabs, Arize

Kesimpulan

MLOps adalah bridge antara ML research dan production. Tanpa MLOps, model AI baguspun hanya tinggal di notebook dan tidak bisa deliver value.

Key takeaways:

Deployment = REST API, Docker, atau cloud service
Monitoring = Track performance, data drift, concept drift
Retraining = Automated pipeline untuk keep model fresh
Testing = A/B testing untuk safe model updates
Tools = MLflow, Kubeflow, Evidently, dll.

Next step: Coba deploy model sederhana dengan Flask/FastAPI, containerize dengan Docker, dan setup monitoring dasar. Selamat ber-MLOps! 🚀

Pernah deploy model ke production? Share pengalaman sukses atau lessons learned-mu!