Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068

Merged
merged 19 commits into from
Feb 17, 2025

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Feb 11, 2025

This PR adds an option to benchmark v1 locally by setting VLLM_USE_V1=1. Because we don't have the hardware on OSS to run v1 benchmark for now, I plan to run this periodically first in some internal Meta devgpus that I have at hand (A100/H100/MI300x).

As there is no Buildkite in internal Meta hardwares, in order to make the results available on open source, the route that I'm proposing here is to integrate with PyTorch OSS benchmark database. Having the benchmark data there would also allow us to build proper dashboard for v1 like what we have on PyTorch HUD, i.e https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch. Here is an example output for latency benchmark.

This is a big change on how vLLM benchmark is integrated with PyTorch benchmark infra going forward, I will limit the scope to only latency benchmark for now and gather feedbacks from vLLM team first. If this has the green light to go forward, I will add other benchmarks (serving, throughput) in subsequent PRs. After refactoring the change in to a new benchmark_utils module, it seems easy to just apply it on all 3 benchmarks (latency, serving, and throughput).

Note that this change doesn't interfere with the current v0 benchmark CI. The new code path will only be executed by setting VLLM_VERSION=v1 SAVE_TO_PYTORCH_BENCHMARK_FORMAT=1 bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh. The final benchmark_results.md could also be downloaded from PyTorch benchmark bucket, i.e. benchmark_results.md

cc @simon-mo @houseroad @youngkent

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Feb 11, 2025
@huydhn huydhn force-pushed the add-v1-benchmark-script branch from e9be0ec to 10971bd Compare February 11, 2025 02:30
Signed-off-by: Huy Do <[email protected]>
@youngkent
Copy link
Contributor

cc: @ywang96 @simon-mo

@ywang96 ywang96 self-assigned this Feb 11, 2025
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments - PTAL

benchmarks/benchmark_latency.py Outdated Show resolved Hide resolved
@huydhn huydhn changed the title Run v1 latency benchmark and integrate with PyTorch OSS benchmark database Run v1 benchmark and integrate with PyTorch OSS benchmark database Feb 12, 2025
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huydhn Thank you for updating this PR with my comments! I think it's in a much better shape now.

I left another round of comments so please take a look, and is it possible for you to share example outputs from running serving and throughput benchmark similar to the latency one you shared? (Edit: just saw your replies in the previous comment!)

benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved
benchmarks/benchmark_serving.py Outdated Show resolved Hide resolved
@huydhn huydhn requested a review from ywang96 February 14, 2025 20:45
@@ -1014,7 +1043,7 @@ def main(args: argparse.Namespace):
default=None,
help="Server or API base url if not using http host and port.",
)
parser.add_argument("--host", type=str, default="localhost")
parser.add_argument("--host", type=str, default="127.0.0.1")
Copy link
Contributor Author

@huydhn huydhn Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This change is needed to force the benchmark script to use ipv4. There is nothing wrong with resolving localhost to ::1 with ipv6, but ipv6 doesn't work on the internal Meta hardware that I plan to run the benchmark. So, this seems much easier than trying to work out the reason why ipv6 doesn't work there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine by me, but can you add a comment here regarding this?

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this integration! This looks good to me now!

@ywang96
Copy link
Member

ywang96 commented Feb 17, 2025

Oops - I forgot to turn on ready label and auto-merge. Doing it now!

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2025
@ywang96 ywang96 enabled auto-merge (squash) February 17, 2025 07:02
@ywang96 ywang96 merged commit 4518683 into vllm-project:main Feb 17, 2025
41 checks passed
@huydhn
Copy link
Contributor Author

huydhn commented Feb 17, 2025

Oops - I forgot to turn on ready label and auto-merge. Doing it now!

Thank you for the review!

Just FYI, while running v1 benchmark, I think I have found an issue with in torch.compile cache and document it here #13392

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed structured-output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants