Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068

huydhn · 2025-02-11T02:25:43Z

This PR adds an option to benchmark v1 locally by setting VLLM_USE_V1=1. Because we don't have the hardware on OSS to run v1 benchmark for now, I plan to run this periodically first in some internal Meta devgpus that I have at hand (A100/H100/MI300x).

As there is no Buildkite in internal Meta hardwares, in order to make the results available on open source, the route that I'm proposing here is to integrate with PyTorch OSS benchmark database. Having the benchmark data there would also allow us to build proper dashboard for v1 like what we have on PyTorch HUD, i.e https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch. Here is an example output for latency benchmark.

This is a big change on how vLLM benchmark is integrated with PyTorch benchmark infra going forward, I will limit the scope to only latency benchmark for now and gather feedbacks from vLLM team first. If this has the green light to go forward, I will add other benchmarks (serving, throughput) in subsequent PRs. After refactoring the change in to a new benchmark_utils module, it seems easy to just apply it on all 3 benchmarks (latency, serving, and throughput).

Note that this change doesn't interfere with the current v0 benchmark CI. The new code path will only be executed by setting VLLM_VERSION=v1 SAVE_TO_PYTORCH_BENCHMARK_FORMAT=1 bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh. The final benchmark_results.md could also be downloaded from PyTorch benchmark bucket, i.e. benchmark_results.md

cc @simon-mo @houseroad @youngkent

github-actions · 2025-02-11T02:25:54Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Huy Do <[email protected]>

youngkent · 2025-02-11T06:29:40Z

cc: @ywang96 @simon-mo

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

benchmarks/benchmark_latency.py

ywang96

Left some comments - PTAL

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh

benchmarks/benchmark_latency.py

Signed-off-by: Huy Do <[email protected]>

ywang96

@huydhn Thank you for updating this PR with my comments! I think it's in a much better shape now.

I left another round of comments so please take a look, and ~~is it possible for you to share example outputs from running serving and throughput benchmark similar to the latency one you shared?~~ (Edit: just saw your replies in the previous comment!)

benchmarks/benchmark_serving.py

Signed-off-by: Huy Do <[email protected]>

huydhn · 2025-02-15T08:28:31Z

benchmarks/benchmark_serving.py

@@ -1014,7 +1043,7 @@ def main(args: argparse.Namespace):
        default=None,
        help="Server or API base url if not using http host and port.",
    )
-    parser.add_argument("--host", type=str, default="localhost")
+    parser.add_argument("--host", type=str, default="127.0.0.1")


Note: This change is needed to force the benchmark script to use ipv4. There is nothing wrong with resolving localhost to ::1 with ipv6, but ipv6 doesn't work on the internal Meta hardware that I plan to run the benchmark. So, this seems much easier than trying to work out the reason why ipv6 doesn't work there.

This is fine by me, but can you add a comment here regarding this?

Signed-off-by: Huy Do <[email protected]>

ywang96

Thanks for adding this integration! This looks good to me now!

Signed-off-by: Huy Do <[email protected]>

ywang96 · 2025-02-17T06:59:57Z

Oops - I forgot to turn on ready label and auto-merge. Doing it now!

huydhn · 2025-02-17T09:37:30Z

Oops - I forgot to turn on ready label and auto-merge. Doing it now!

Thank you for the review!

Just FYI, while running v1 benchmark, I think I have found an issue with in torch.compile cache and document it here #13392

…llm-project#13068) Signed-off-by: Huy Do <[email protected]>

mergify bot added the ci/build label Feb 11, 2025

huydhn added 6 commits February 10, 2025 18:30

Run v1 benchmark

ce79bc5

Signed-off-by: Huy Do <[email protected]>

Fix env variables

d84671f

Signed-off-by: Huy Do <[email protected]>

Another tweak in the output format

27caeb3

Signed-off-by: Huy Do <[email protected]>

Use vars

4ae88ea

Signed-off-by: Huy Do <[email protected]>

No need to skip buildkite upload

c04e53f

Signed-off-by: Huy Do <[email protected]>

Fix typo

10971bd

Signed-off-by: Huy Do <[email protected]>

huydhn force-pushed the add-v1-benchmark-script branch from e9be0ec to 10971bd Compare February 11, 2025 02:30

Fix pre-commit

93c3b85

Signed-off-by: Huy Do <[email protected]>

houseroad reviewed Feb 11, 2025

View reviewed changes

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh Outdated Show resolved Hide resolved

houseroad reviewed Feb 11, 2025

View reviewed changes

benchmarks/benchmark_latency.py Show resolved Hide resolved

ywang96 self-assigned this Feb 11, 2025

youngkent reviewed Feb 11, 2025

View reviewed changes

benchmarks/benchmark_latency.py Outdated Show resolved Hide resolved

ywang96 reviewed Feb 11, 2025

View reviewed changes

.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh Outdated Show resolved Hide resolved

benchmarks/benchmark_latency.py Outdated Show resolved Hide resolved

Address review comments

7cddf74

Signed-off-by: Huy Do <[email protected]>

huydhn changed the title ~~Run v1 latency benchmark and integrate with PyTorch OSS benchmark database~~ Run v1 benchmark and integrate with PyTorch OSS benchmark database Feb 12, 2025

huydhn added 4 commits February 11, 2025 23:42

Add benchmark_serving

a14865c

Signed-off-by: Huy Do <[email protected]>

There are only ttfts and itls

19f436d

Signed-off-by: Huy Do <[email protected]>

Ignore some raw metrics in serving benchmark

6874d65

Signed-off-by: Huy Do <[email protected]>

Another typo

826887c

Signed-off-by: Huy Do <[email protected]>

huydhn requested review from ywang96, houseroad and youngkent February 12, 2025 09:10

ywang96 reviewed Feb 14, 2025

View reviewed changes

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

ywang96 reviewed Feb 14, 2025

View reviewed changes

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved

Add the rest of serving metrics

42288cb

Signed-off-by: Huy Do <[email protected]>

huydhn requested a review from ywang96 February 14, 2025 20:45

Remove redundant sys.path

61ca7c2

Signed-off-by: Huy Do <[email protected]>

huydhn added 2 commits February 14, 2025 23:46

Merge branch 'main' into add-v1-benchmark-script

0798703

Use 127.0.0.1 for ipv4

a4e24e4

Signed-off-by: Huy Do <[email protected]>

mergify bot added the structured-output label Feb 15, 2025

huydhn commented Feb 15, 2025

View reviewed changes

Use .pytorch is weird, let just use .json

fd8fc67

Signed-off-by: Huy Do <[email protected]>

ywang96 approved these changes Feb 16, 2025

View reviewed changes

huydhn added 2 commits February 15, 2025 23:00

Add a comment

85910f5

Signed-off-by: Huy Do <[email protected]>

Merge branch 'main' into add-v1-benchmark-script

e05d7dd

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2025

ywang96 enabled auto-merge (squash) February 17, 2025 07:02

ywang96 merged commit 4518683 into vllm-project:main Feb 17, 2025
41 checks passed

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025

Run v1 benchmark and integrate with PyTorch OSS benchmark database (v…

fc7a7a2

…llm-project#13068) Signed-off-by: Huy Do <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068

Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068

huydhn commented Feb 11, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 11, 2025

youngkent commented Feb 11, 2025

ywang96 left a comment

ywang96 left a comment •

edited

Loading

huydhn Feb 15, 2025 •

edited

Loading

ywang96 Feb 16, 2025

ywang96 left a comment

ywang96 commented Feb 17, 2025

huydhn commented Feb 17, 2025

Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068

Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068

Conversation

huydhn commented Feb 11, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 11, 2025

youngkent commented Feb 11, 2025

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 left a comment • edited Loading

Choose a reason for hiding this comment

huydhn Feb 15, 2025 • edited Loading

Choose a reason for hiding this comment

ywang96 Feb 16, 2025

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 commented Feb 17, 2025

huydhn commented Feb 17, 2025

huydhn commented Feb 11, 2025 •

edited by github-actions bot

Loading

ywang96 left a comment •

edited

Loading

huydhn Feb 15, 2025 •

edited

Loading