Replies: 4 comments 2 replies
-
|
Finally find out why such huge GPU memory usage difference lol The counterpart of Once I changed to |
Beta Was this translation helpful? Give feedback.
-
|
Great benchmark comparison! At RevolutionAI (https://revolutionai.io), we have done similar vLLM vs SGLang comparisons. A few factors that explain typical differences: Where vLLM often wins:
Where SGLang may win:
Benchmark methodology tips:
Key insight: The "winner" depends heavily on your workload pattern. Batch inference? vLLM. Interactive chat with shared context? SGLang might edge ahead. What was your test setup — concurrent users, prompt lengths, model size? |
Beta Was this translation helpful? Give feedback.
-
|
This vLLM vs SGLang benchmark is really valuable! Let me help explain some differences: Why SGLang might be faster in some cases:
Why vLLM might be faster in others:
Key factors in your benchmark: To explain your specific results:
We've benchmarked both extensively at RevolutionAI for production deployments. Generally vLLM wins on throughput, SGLang on latency — but your workload matters most. Can you share your test config? |
Beta Was this translation helpful? Give feedback.
-
|
Great benchmarking work! The memory and performance differences come from architectural choices. Why SGLang uses less memory (7GB vs 21GB):
To make vLLM use less memory: vllm serve qwen2.5-7b \
--gpu-memory-utilization 0.3 \
--max-model-len 2048 \
--max-num-seqs 8Why SGLang has more consistent latency:
Fair comparison settings: # vLLM
vllm serve --gpu-memory-utilization 0.3
# SGLang
python -m sglang.launch_server --mem-fraction 0.3What to benchmark:
We benchmark inference engines at Revolution AI — memory utilization config is the biggest factor in these comparisons. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am using vllm for all my projects but I had been thinking maybe I should give sglang a try. So I did a performance test against them. Before the test I had no idea what result I would get as I had no bias at all. So I was very surprised about the result!
I use one A10 GPU to test Qwen 2.5-7B, as I have a specific, focused goal: to evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU like A10.
I find that SGLang only uses 7G GPU memory compared with 21G memory (A10 has 24 G memory in total) and delivers a much better result, especially the consistent response times.
But why is such big difference ? Can someone help to explain it ? This is my project, https://github.com/qiulang/vllm-sglang-perf
Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions