This project is a hands-on DevOps/MLOps exercise focused on deploying a HuggingFace LoRA-based sentiment analysis model using best practices for productionization, CI/CD, security, monitoring, and scalable infrastructure on Kubernetes. The goal is to demonstrate a full ML product lifecycle, from model training to robust, observable, and secure deployment.
- Model: Fine-tuned HuggingFace transformer (LoRA) for movie sentiment analysis
- API: FastAPI backend for inference, metrics, and health endpoints
- UI: Gradio frontend for interactive predictions
- Dependency Management: Poetry for streamlined dependency resolution, virtual environment isolation, and reproducible builds
- Containerization: Separate Docker containers for backend and UI, managed with Docker Compose for local development and testing purposes
- Artifact Management: Self-hosted Docker registry used for storing container images and Helm charts
- Orchestration: Kubernetes manifests and Helm charts for scalable, production-grade deployment on an on-prem cluster with 3 GPU enabled nodes
- Networking: Ingress controller configured for HTTP routing to backend and UI services
- CI/CD: GitHub Actions with a self-hosted runner, automated builds, tests (orchestrated with Nox), and releases
- Testing: Unit, integration, and coverage reports
- Code Quality: Linting (ruff), type checking (mypy), security (bandit, pip-audit)
- Monitoring: Prometheus metrics, Loki logs, and Grafana dashboards. Set-up of the monitoring environment can be found here central monitoring repository
- Security: Basic auth for metrics, secrets management, and security scanning
src/ # Main application code (FastAPI, Gradio, model utils)
docker/ # Dockerfiles for backend, frontend, and test jobs
helm-chart-sentiment-app/ # Helm chart for Kubernetes deployment
k8s/ # Raw Kubernetes manifests
tests/ # Unit and integration tests
promtail/ # Configuration for promtail log scraping
.github/workflows/ # CI/CD pipeline
notebooks/ # Model training and experimentation
docs/img/ # Architecture, infra, and dashboard diagrams
Makefile # Common build/test commands
pyproject.toml # Poetry project config and dependencies
poetry.lock # Poetry lock file
noxfile.py # Nox automation for test/lint workflows
docker-compose.yml # Local multi-container orchestration
TODO # Project plan and future roadmap
Local deployment - was established and used before the K8s cluster
-
Clone the repo
-
Set environment variables (example):
export REGISTRY_URL=local export PROMETHEUS_METRICS_USER=admin export PROMETHEUS_METRICS_PASS=secret export FASTAPI_URL=http://fastapi:8000 export LOKI_URL=http://localhost:3100
-
Build and start Docker images using Docker Compose:
REGISTRY_URL=$REGISTRY_URL docker compose -f docker-compose.yml up --build
-
Access the services:
- FastAPI: http://localhost:8000
- Gradio UI: http://localhost:7860
Note:
- The CI/CD pipeline in
.github/workflows/ci-cd.yaml
demonstrates how to authenticate and push images to a self-hosted registry. For local development, the above steps avoid pushing the images to a repository for simplicity.- Make sure your environment variables match those expected in
docker-compose.yml
.
-
Prerequisites
- An on-prem Kubernetes cluster with GPU-enabled nodes
- Access to a self-hosted Docker image registry
- Helm installed locally
-
Clone the repo
-
Build and push Docker images to your self-hosted registry
- Set the
REGISTRY_URL
environment variable to your registry (e.g.,export REGISTRY_URL=your.registry:5000
) - Authenticate to your registry if required (see CI/CD pipeline for example)
- Build and push images:
REGISTRY_URL=$REGISTRY_URL docker compose -f docker-compose.yml build REGISTRY_URL=$REGISTRY_URL docker compose -f docker-compose.yml push
- Set the
-
Customize Helm values
- Edit
helm-chart-sentiment-app/values.yaml
values to your preferences or leave as is for default - Create a
values-secret.yaml
file which contains:image_repository: "your.registry:5000" fast_api_secret_prometheus_user: base64 encoded user fast_api_secret_prometheus_password: base64 encoded password
- Edit
-
Install the Helm chart
helm upgrade --install sentiment-app ./helm-chart-sentiment-app -f values-secret.yaml
-
Monitor and manage your deployment
- Use
kubectl
,k9s
, andhelm
to manage and troubleshoot your deployment.
- Use
Note:
- The images referenced in your Helm values must be accessible by your Kubernetes cluster (i.e., pushed to a registry the cluster can pull from).
- The CI/CD pipeline in
.github/workflows/ci-cd.yaml
provides a full example of this workflow, including registry authentication and Helm chart packaging.
-
Run all tests, linting, and security scans (locally):
make run-nox-tests
-
Generate test coverage report (locally):
make run-pytests-coverage-html
-
Coverage report:
- Prometheus: Scrapes FastAPI
/metrics
endpoint for custom and default metrics - Loki & Promtail: Collects logs from all Docker containers
- Grafana: Dashboards for metrics and logs
- Bandit and pip-audit for static analysis and dependency scanning
- Basic authentication for sensitive endpoints
- Secrets managed via Kubernetes secrets and GitHub secrets
- Python 3.10, FastAPI, Gradio, HuggingFace Transformers, LoRA, PyTorch
- Docker, Docker Compose, Poetry
- Kubernetes, Helm
- Prometheus, Grafana, Loki, Promtail
- GitHub Actions
- Nox, pytest, ruff, mypy, bandit, pip-audit
See the TODO
file for a detailed roadmap, including:
- Set-up and enable monitoring on the K8s cluster
- Explore ELK stack and potential use cases for Kafka
- Implement KServe for easier model deployment
- Advanced CI/CD (feature branch workflows, alternative tools like Jenkins, ArgoCD, CircleCI)
- More strict code styling and pre-commit hooks
- Cloud deployment using Terraform
- Model tracking, data drift, and automated retraining (Evidently)
- ML serving at scale (TorchServe, Triton, etc.)
- Introduce DVC for storing and versioning data and model weights
- Explore distributed computing and GPU accelerated data science (Spark, Dask, CuDF, CuML)
- Modularize the existing training notebook
- Documentation with MkDocs/Sphinx