This project demonstrates a containerized machine learning deployment using Ray Serve, featuring a Scikit-learn Iris classifier, custom Prometheus metrics, and a pre-configured Grafana dashboard.
models/: Serialized model artifacts (.pkl).data/: Model metadata and labels (.json).monitoring/: Prometheus and Grafana configuration.serve_model.py: Core deployment logic with custom metrics and health checks.serve_config.yaml: Ray Serve deployment configuration.locustfile.py: Load testing script for simulating traffic.docker-compose.yml: Full-stack orchestration (Ray, Prometheus, Grafana, Locust).
Start the entire environment (Ray Serve, Prometheus, and Grafana) in detached mode:
docker compose up --build -dWait about 20-30 seconds for the Ray cluster and Serve application to fully initialize.
Use the provided test client to send a prediction request:
uv run python query_model.py- Ray Dashboard: http://localhost:8265 — Monitor cluster status, logs, and integrated Grafana metrics (under the "Metrics" tab).
- Grafana: http://localhost:3000 — Dedicated visualization platform.
- Default login:
admin/admin(Anonymous viewing enabled). - Dashboard: Navigate to Dashboards -> Ray -> Default Dashboard.
- Default login:
- Prometheus: http://localhost:9090 — Query raw metrics.
The application exports the following metrics:
ray_iris_predictions_total: Counter of predictions labeled by Iris species.ray_iris_prediction_latency_ms: Histogram of prediction processing time.
To simulate traffic and see the metrics in the Ray Dashboard and Grafana:
- Open Locust: http://localhost:8089
- Number of users: 10
- Spawn rate: 2
- Host:
http://ray-head:8000 - Start swarming.
To stop all services and remove containers:
docker compose down -v