Production Bug Analysis for LLM Serving Frameworks

A systematic research study analyzing production bugs and performance issues in major open-source LLM serving frameworks: vLLM, llama.cpp, and SGLang.

📊 Project Overview

This project analyzes 12,349 GitHub issues across three leading LLM serving frameworks to identify common production failure modes, bug patterns, and performance bottlenecks. Our goal is to improve the reliability and robustness of LLM deployments in production environments.

Key Statistics

vLLM: 4,078 issues analyzed (2,225 bugs/performance issues)
llama.cpp: 5,470 issues analyzed (2,601 bugs/performance issues)
SGLang: 2,567 issues analyzed (106 bugs/performance issues)

🎯 Research Objectives

Identify and categorize common production bugs in LLM serving systems
Create a comprehensive bug taxonomy for LLM infrastructure
Analyze critical failure modes and their root causes
Provide actionable recommendations for improving production reliability
Publish findings to help the community build more robust LLM systems

📁 Repository Structure

.
├── docs/
│   └── research_paper/      # Research documentation and findings
│       ├── research_plan.md
│       ├── bug_taxonomy.md
│       ├── critical_bugs_analysis.md
│       └── paper_outline.md
├── scripts/                 # Data collection and analysis tools
│   ├── scrape_github_issues.py
│   ├── analyze_issues.py
│   └── issues_datasets/     # JSON datasets of scraped issues
├── vllm/                    # vLLM framework demo and testing
├── llama-cpp-demo/          # llama.cpp framework demo
├── sglang-demo/             # SGLang framework demo
└── scheduler/               # Additional research components

🐛 Bug Taxonomy

Our analysis identified 7 major categories of production bugs:

Memory Management Issues - OOM errors, memory leaks, VRAM fragmentation
Concurrency & Synchronization - Race conditions, deadlocks, cache corruption
GPU/CUDA Issues - Device errors, multi-GPU synchronization problems
API/Protocol Issues - Request handling failures, streaming errors
Model-Specific Bugs - Loading failures, inference errors
Performance Degradation - Latency spikes, throughput bottlenecks
Scaling & Distribution - Multi-GPU coordination, cluster issues

🚀 Quick Start

Prerequisites

Python 3.8+
Git
(Optional) CUDA-capable GPU for framework testing

Installation

Clone the repository:

git clone https://github.com/yunwei37/vllm-exp.git
cd vllm-exp

Set up Python environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

(Optional) Set up framework demos:

# For vLLM demo
cd vllm && bash quick_start.sh

# For llama.cpp demo
cd llama-cpp-demo && bash quick_start.sh

# For SGLang demo
cd sglang-demo && bash quick_start.sh

🔧 Usage

Analyze GitHub Issues

Scrape issues from GitHub (requires GitHub token):

python scripts/scrape_github_issues.py --repo vllm-project/vllm --output vllm_issues.json

Analyze scraped issues:

python scripts/analyze_issues.py --input vllm_issues.json --output analysis_results.json

Explore Research Findings

See our comprehensive research documentation:

Research Plan - Overview of methodology
Bug Taxonomy - Detailed bug categorization
Critical Bugs Analysis - Deep dive into severe issues

📈 Key Findings

Memory management is the #1 cause of production failures across all frameworks
Concurrency bugs are particularly challenging due to non-deterministic behavior
GPU/CUDA issues often manifest only under high load or specific hardware
API compatibility problems frequently occur during framework upgrades
Performance regressions are common but often go unnoticed until production

🤝 Contributing

We welcome contributions! Areas where help is needed:

Expanding the bug taxonomy with new categories
Adding analysis for more LLM frameworks
Improving issue classification accuracy
Creating automated bug detection tools

📝 Citation

If you use this research in your work, please cite:

@misc{llm-production-bugs-2025,
  title={A Systematic Study of Production Bugs in LLM Serving Frameworks},
  author={[Authors]},
  year={2025},
  url={https://github.com/yunwei37/vllm-exp}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

The vLLM, llama.cpp, and SGLang communities for their open-source contributions
All issue reporters who helped identify and document these bugs
Researchers and engineers working to improve LLM serving reliability

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs/research_paper		docs/research_paper
llama-cpp-demo		llama-cpp-demo
scripts		scripts
sglang-demo		sglang-demo
vllm		vllm
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Production Bug Analysis for LLM Serving Frameworks

📊 Project Overview

Key Statistics

🎯 Research Objectives

📁 Repository Structure

🐛 Bug Taxonomy

🚀 Quick Start

Prerequisites

Installation

🔧 Usage

Analyze GitHub Issues

Explore Research Findings

📈 Key Findings

🤝 Contributing

📝 Citation

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

eunomia-bpf/llm-serving-bug-study

Folders and files

Latest commit

History

Repository files navigation

Production Bug Analysis for LLM Serving Frameworks

📊 Project Overview

Key Statistics

🎯 Research Objectives

📁 Repository Structure

🐛 Bug Taxonomy

🚀 Quick Start

Prerequisites

Installation

🔧 Usage

Analyze GitHub Issues

Explore Research Findings

📈 Key Findings

🤝 Contributing

📝 Citation

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages