A comprehensive platform for deploying and managing Generative AI applications on OpenShift, featuring multiple model serving runtimes, vector databases, object storage, API gateways, web interfaces, monitoring, and testing suites.
This project provides a complete, production-ready infrastructure for deploying Generative AI applications on OpenShift. It includes environment preparation, multiple LLM serving backends, vector databases for embeddings, S3-compatible storage, API gateways for unified access, user-friendly GUIs, comprehensive monitoring stacks, and extensive load/performance testing capabilities.
- Environment Preparation: Automated setup and cleanup scripts for OpenShift environments
- GitOps Integration: ArgoCD configurations for continuous deployment
- Multiple LLM Backends: Support for Ollama (CPU), vLLM (CPU/GPU), and NVIDIA NIM (GPU)
- Vector Databases: Milvus for high-performance vector similarity search
- Object Storage: MinIO S3-compatible storage for models and data
- API Gateways: LiteLLM for unified model API access
- Web GUIs: AnythingLLM and OpenWebUI for intuitive AI interaction and document management
- Monitoring Stack: Grafana dashboards and Prometheus metrics for GPU and model performance
- Load Testing: Comprehensive test suite including smoke, stress, spike, and performance tests
- LLM Performance Testing: Specialized benchmarks for model inference and throughput
- Infrastructure Automation: Scripts for automated deployment and resource management
- Ollama: Lightweight runtime for CPU-based model serving
- vLLM: High-performance serving runtime with GPU acceleration
- NVIDIA NIM: Optimized microservices for NVIDIA GPU deployments
- Milvus: Cloud-native vector database for similarity search and embeddings
- MinIO: S3-compatible object storage for models, documents, and artifacts
- LiteLLM: Unified API gateway for accessing multiple LLM providers
- AnythingLLM: Web-based GUI for document management and AI chat interactions
- OpenWebUI: Alternative web interface for AI model interactions
- Grafana: Dashboards for monitoring GPU usage, model performance, and system metrics
- Prometheus: Metrics collection and alerting system
- Load Testing Suite: Smoke, stress, spike, and performance tests
- LLM Performance Testing: Specialized benchmarks for model inference and throughput
- Benchmarking Tools: Model performance and throughput testing
- ArgoCD Configurations: GitOps manifests for continuous deployment
- Infrastructure Scripts: Automated setup and cleanup utilities
- Kubernetes Cluster: OpenShift 4.x+ (preferred) or vanilla Kubernetes
- CLI Tools:
kubectlorocinstalled and configured - Storage: Sufficient persistent storage for models and data
- Compute Resources: CPU or GPU nodes depending on deployment type
- Access: Cluster admin access for namespace and resource creation
- NVIDIA GPUs with appropriate drivers
- NGC API key for NVIDIA NIM models
- NVIDIA Developer Program membership
genai-application/
├── docs/ # Documentation and images
│ └── images/ # Documentation images and diagrams
├── env_preparation/ # Environment setup and cleanup scripts
├── gitops/ # GitOps configurations (ArgoCD)
├── models/ # LLM model deployments
│ ├── nvidia_nim/ # NVIDIA NIM GPU models
│ ├── ollama/ # Ollama CPU models
│ └── vllm/ # vLLM CPU/GPU models
├── monitoring_alerting/ # Monitoring stack and alerting rules
├── rag_usecase/ # RAG-specific configurations (example use case)
├── s3_storage/ # S3-compatible storage deployments
│ └── minio_on_openshift/ # MinIO storage deployment
├── tests/ # Testing suites and performance benchmarks
│ ├── last_und_performance/ # Load and performance tests
│ └── llm_performance/ # LLM-specific performance testing
├── vectordb/ # Vector database deployments
│ └── milvus/ # Milvus vector database
├── web_interfaces/ # Web GUI deployments
│ ├── anythingllm/ # AnythingLLM GUI deployment
│ └── openwebui/ # OpenWebUI interface
├── gpu_deployment.md # GPU deployment guide
├── infra_preparation_auto.sh # Infrastructure automation script
├── LICENSE # Apache 2.0 License
├── README.md # This file
└── ROADMAP.md # Project roadmap
We welcome contributions! Please see our roadmap for planned features.
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
- Follow Kubernetes best practices for manifests
- Include documentation for new components
- Add tests for new functionality
- Update the roadmap for significant changes
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This project builds upon and includes components from:
For issues and questions:
- Check existing issues
- Create a new issue with detailed information
- Review component-specific READMEs for troubleshooting
Note: This platform is optimized for OpenShift clusters and provides a foundation for various Generative AI use cases including RAG, chatbots, content generation, and more. Support for other Kubernetes distributions may require modifications. c:\Users\bahma\Desktop\projects\13_RAG_LLMs\genai-application\README.md
