-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run v1 benchmark and integrate with PyTorch OSS benchmark database #13068
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
e9be0ec
to
10971bd
Compare
Signed-off-by: Huy Do <[email protected]>
.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments - PTAL
.buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
Outdated
Show resolved
Hide resolved
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huydhn Thank you for updating this PR with my comments! I think it's in a much better shape now.
I left another round of comments so please take a look, and is it possible for you to share example outputs from running serving and throughput benchmark similar to the latency one you shared? (Edit: just saw your replies in the previous comment!)
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
@@ -1014,7 +1043,7 @@ def main(args: argparse.Namespace): | |||
default=None, | |||
help="Server or API base url if not using http host and port.", | |||
) | |||
parser.add_argument("--host", type=str, default="localhost") | |||
parser.add_argument("--host", type=str, default="127.0.0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This change is needed to force the benchmark script to use ipv4. There is nothing wrong with resolving localhost
to ::1
with ipv6, but ipv6 doesn't work on the internal Meta hardware that I plan to run the benchmark. So, this seems much easier than trying to work out the reason why ipv6 doesn't work there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine by me, but can you add a comment here regarding this?
Signed-off-by: Huy Do <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this integration! This looks good to me now!
Signed-off-by: Huy Do <[email protected]>
Oops - I forgot to turn on ready label and auto-merge. Doing it now! |
Thank you for the review! Just FYI, while running v1 benchmark, I think I have found an issue with in torch.compile cache and document it here #13392 |
…llm-project#13068) Signed-off-by: Huy Do <[email protected]>
This PR adds an option to benchmark v1 locally by setting
VLLM_USE_V1=1
. Because we don't have the hardware on OSS to run v1 benchmark for now, I plan to run this periodically first in some internal Meta devgpus that I have at hand (A100/H100/MI300x).As there is no Buildkite in internal Meta hardwares, in order to make the results available on open source, the route that I'm proposing here is to integrate with PyTorch OSS benchmark database. Having the benchmark data there would also allow us to build proper dashboard for v1 like what we have on PyTorch HUD, i.e https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fpytorch. Here is an example output for latency benchmark.
This is a big change on how vLLM benchmark is integrated with PyTorch benchmark infra going forward,
I will limit the scope to onlyAfter refactoring the change in to a newlatency
benchmark for now and gather feedbacks from vLLM team first. If this has the green light to go forward, I will add other benchmarks (serving, throughput) in subsequent PRs.benchmark_utils
module, it seems easy to just apply it on all 3 benchmarks (latency, serving, and throughput).Note that this change doesn't interfere with the current v0 benchmark CI. The new code path will only be executed by setting
VLLM_VERSION=v1 SAVE_TO_PYTORCH_BENCHMARK_FORMAT=1 bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
. The finalbenchmark_results.md
could also be downloaded from PyTorch benchmark bucket, i.e. benchmark_results.mdcc @simon-mo @houseroad @youngkent