Enterprise-grade LLM API Gateway with built-in privacy protection, caching, and observability
- 🔒 Privacy First: Automatic PII detection and anonymization using Microsoft Presidio
- ⚡ Smart Caching: Redis-powered response caching to reduce API costs
- 📊 Full Observability: Comprehensive Prometheus metrics + Grafana dashboards
- 🛡️ Enterprise Security: API key authentication with rate limiting
- 🔌 Multi-Provider Ready: Extensible architecture for multiple LLM providers
- 🚀 OpenAI Compatible: Drop-in replacement for OpenAI API endpoints
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │───▶│ DLP │───▶│Rate Limiter │───▶│ Auth │
└─────────────┘ │ Middleware │ │ Middleware │ │ Middleware │
└──────────────┘ └─────────────┘ └─────────────┘
│
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌─────────────┐
│ Redis │◀───│ Cache │◀───│ Provider │◀───│ Gateway │
│ Cache │ │ Layer │ │ Factory │ │ Handler │
└─────────────┘ └──────────────┘ └─────────────┘ └─────────────┘
# Clone the repository
git clone https://github.com/ozanunal0/prometheus-gateway.git
cd prometheus-gateway
# Set your OpenAI API key
echo "OPENAI_API_KEY=your_key_here" > .env
# Start all services
docker-compose up -d
# Create an API key
docker-compose exec gateway python create_key.py [email protected]
# Setup environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_lg
# Run the gateway
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Make a request (OpenAI compatible)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key_here" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
Access your monitoring stack:
- Gateway: http://localhost:8000
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
Environment Variable | Description | Default |
---|---|---|
OPENAI_API_KEY |
OpenAI API key | Required |
REDIS_HOST |
Redis hostname | localhost |
REDIS_PORT |
Redis port | 6379 |
REDIS_PASSWORD |
Redis password | None |
- PII Detection: Automatically detects and anonymizes emails, phone numbers, SSNs, credit cards
- Secure API Keys: SHA-256 hashed storage, never logged in plaintext
- Rate Limiting: Configurable per-key rate limits (default: 10/minute)
- Request Isolation: Each request is processed independently
Track your usage with built-in metrics:
- Request counts by owner, model, and status
- Response latency histograms
- Token usage tracking (prompt, completion, total)
- Error rates and performance insights
Add new LLM providers easily:
from app.providers.base import LLMProvider
class YourProvider(LLMProvider):
async def chat_completions(self, request):
# Your implementation
pass
This project is in active development. Contributions are welcome!
This project is built with a "security-first" mindset, incorporating best practices like PII scrubbing, authentication, and rate limiting. However, it is an open-source tool provided "as-is" and has not been subjected to a professional, third-party security audit.
Security is complex. While Prometheus Gateway can be a powerful tool to enhance your security posture for LLM applications, it is not a substitute for a comprehensive security assessment of your own infrastructure and processes. Please perform your own security validation before using this project in production environments with highly sensitive data.
I welcome feedback, vulnerability reports, and contributions from the security community to make this project more robust.