
Deploying AI models to production requires reliable processes that bridge research and engineering. This article covers practical MLOps best practices: model versioning, reproducible experiments with MLflow, packaging and containerizing models, CI/CD for model deployment, and monitoring in production.
Key concepts you'll get from this article: - Model and data versioning - Reproducible experiments using MLflow - Packaging models and containerization - CI/CD for model deployment - Monitoring, alerting, and automated retraining
Treat models and datasets as first-class artifacts. Use a tracking system (MLflow, DVC, or a feature store) to record runs, hyperparameters, metrics, and dataset versions. This makes it possible to reproduce and compare experiments.
Minimal MLflow example (Python)
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_splitX, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)mlflow.set_experiment('iris-demo')
with mlflow.start_run() as run:
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
acc = clf.score(X_test, y_test)
mlflow.log_metric('accuracy', acc)
mlflow.sklearn.log_model(clf, 'model')This stores model artifacts and metrics you can later query and promote to production.
Pin library versions and capture environment details (requirements.txt or poetry.lock). Consider packaging experiments into containers so runs are reproducible across environments.
Sample Dockerfile for a model service
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY ./app /app
CMD ["python", "server.py"]Use a minimal model-serving layer (FastAPI, Flask) that loads the tracked model artifact. Keep inference code tiny and deterministic: pre-processing, model.forward, post-processing.
Simple FastAPI loader (Python)
from fastapi import FastAPI
import mlflow.pyfuncapp = FastAPI()
model = mlflow.pyfunc.load_model('models:/iris-demo/Production')@app.post('/predict')
def predict(payload: dict):
# payload -> pandas DataFrame or list
preds = model.predict(payload['data'])
return {'predictions': preds.tolist()}Treat model artifacts like code: build, test, and deploy through pipelines. Example pipeline steps: - Run unit + integration tests (data schema, invariants) - Run training and register model in MLflow - Run model validation tests (performance, fairness checks) - If validation passes, promote model to a deployment stage (e.g., Production) - Trigger infrastructure deploy (Kubernetes rollout, serverless function update)
Example GitHub Actions fragment (conceptual)
name: model-deploy
on:
push:
branches: [ main ]
jobs:
test-and-register:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install
run: pip install -r requirements.txt
- name: Run tests
run: pytest -q
- name: Train & register model
run: python train_and_register.pyIn production, monitor both system and model metrics: - Latency and error rates (system) - Model performance (drift, accuracy on sampled labeled data) - Input data distribution shifts
Collect model predictions and a small sample of ground truth labels to compute performance metrics. Use tools like Prometheus + Grafana for metrics and Sentry for exceptions.
When drift is detected or performance degrades, a retraining pipeline can automatically kick off (after gating and human review). Keep retraining reproducible and log datasets and model versions.
Checklist / Best practices
- Track everything: experiments, artifacts, data versions. - Keep inference code small and well-tested. - Use containers for reproducible deployments. - Automate validation and gating in CI pipelines. - Monitor inputs and outputs and surface alerts. - Have a rollback plan and model versioning in place.
MLOps is the engineering glue that makes ML models reliable and maintainable in production. Start small: track experiments, containerize a model, add CI tests, and instrument monitoring. Iterate on automating retraining once confidence and observability are in place.