A systematic research study analyzing production bugs and performance issues in major open-source LLM serving frameworks: vLLM, llama.cpp, and SGLang.
This project analyzes 12,349 GitHub issues across three leading LLM serving frameworks to identify common production failure modes, bug patterns, and performance bottlenecks. Our goal is to improve the reliability and robustness of LLM deployments in production environments.
- vLLM: 4,078 issues analyzed (2,225 bugs/performance issues)
- llama.cpp: 5,470 issues analyzed (2,601 bugs/performance issues)
- SGLang: 2,567 issues analyzed (106 bugs/performance issues)
- Identify and categorize common production bugs in LLM serving systems
- Create a comprehensive bug taxonomy for LLM infrastructure
- Analyze critical failure modes and their root causes
- Provide actionable recommendations for improving production reliability
- Publish findings to help the community build more robust LLM systems
.
├── docs/
│ └── research_paper/ # Research documentation and findings
│ ├── research_plan.md
│ ├── bug_taxonomy.md
│ ├── critical_bugs_analysis.md
│ └── paper_outline.md
├── scripts/ # Data collection and analysis tools
│ ├── scrape_github_issues.py
│ ├── analyze_issues.py
│ └── issues_datasets/ # JSON datasets of scraped issues
├── vllm/ # vLLM framework demo and testing
├── llama-cpp-demo/ # llama.cpp framework demo
├── sglang-demo/ # SGLang framework demo
└── scheduler/ # Additional research components
Our analysis identified 7 major categories of production bugs:
- Memory Management Issues - OOM errors, memory leaks, VRAM fragmentation
- Concurrency & Synchronization - Race conditions, deadlocks, cache corruption
- GPU/CUDA Issues - Device errors, multi-GPU synchronization problems
- API/Protocol Issues - Request handling failures, streaming errors
- Model-Specific Bugs - Loading failures, inference errors
- Performance Degradation - Latency spikes, throughput bottlenecks
- Scaling & Distribution - Multi-GPU coordination, cluster issues
- Python 3.8+
- Git
- (Optional) CUDA-capable GPU for framework testing
- Clone the repository:
git clone https://github.com/yunwei37/vllm-exp.git
cd vllm-exp
- Set up Python environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
- (Optional) Set up framework demos:
# For vLLM demo
cd vllm && bash quick_start.sh
# For llama.cpp demo
cd llama-cpp-demo && bash quick_start.sh
# For SGLang demo
cd sglang-demo && bash quick_start.sh
- Scrape issues from GitHub (requires GitHub token):
python scripts/scrape_github_issues.py --repo vllm-project/vllm --output vllm_issues.json
- Analyze scraped issues:
python scripts/analyze_issues.py --input vllm_issues.json --output analysis_results.json
See our comprehensive research documentation:
- Research Plan - Overview of methodology
- Bug Taxonomy - Detailed bug categorization
- Critical Bugs Analysis - Deep dive into severe issues
- Memory management is the #1 cause of production failures across all frameworks
- Concurrency bugs are particularly challenging due to non-deterministic behavior
- GPU/CUDA issues often manifest only under high load or specific hardware
- API compatibility problems frequently occur during framework upgrades
- Performance regressions are common but often go unnoticed until production
We welcome contributions! Areas where help is needed:
- Expanding the bug taxonomy with new categories
- Adding analysis for more LLM frameworks
- Improving issue classification accuracy
- Creating automated bug detection tools
If you use this research in your work, please cite:
@misc{llm-production-bugs-2025,
title={A Systematic Study of Production Bugs in LLM Serving Frameworks},
author={[Authors]},
year={2025},
url={https://github.com/yunwei37/vllm-exp}
}
This project is licensed under the MIT License - see the LICENSE file for details.
- The vLLM, llama.cpp, and SGLang communities for their open-source contributions
- All issue reporters who helped identify and document these bugs
- Researchers and engineers working to improve LLM serving reliability