Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/n][CI] Load models in CI from S3 instead of HF #13205

Merged
merged 23 commits into from
Feb 19, 2025
Merged

Conversation

khluu
Copy link
Collaborator

@khluu khluu commented Feb 13, 2025

  1. Load some models from S3 path with runai-model-streamer instead of HF by default (only a few test jobs so far, listed below)
  2. Add runai-model-streamer and ...-s3 into CI dependencies
  3. Allow to pull more files from S3 than just *config.json
  4. Strip / from file path when determining destination file path so it doesn't default to /... which machine doesn't have access to write into
  5. Add / into model S3 path if there's no / at the end, mainly to prevent cases where 2 models on S3 bucket match the pattern and confuse the model loader
  6. Don't look for files in HF repo if the model starts with /, which means the model path is a S3 path and model name is already converted to S3Model().dir which looks like /tmp/tmp3123..

Test jobs with models loaded with S3 (not all test files, just as many as I can):

  • Entrypoints llm/ (I haven't done so for openai/ ones yet since they are set up with the remote server and the S3 model path somehow messed things up)
  • Basic correctness
  • Basic models
  • Metrics & Tracing
  • Async Engine, Inputs, Utils, Worker Test
  • Samplers
  • Engine

EC2 Default User added 2 commits February 12, 2025 23:57
p
Signed-off-by:  <>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link

mergify bot commented Feb 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @khluu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 13, 2025
p
Signed-off-by:  <>
@khluu khluu changed the title [ci] Load models from S3 instead of HF [WIP][CI] Load models from S3 instead of HF Feb 13, 2025
@khluu khluu changed the title [WIP][CI] Load models from S3 instead of HF [WIP][CI] Load models in CI from S3 instead of HF Feb 13, 2025
@mergify mergify bot removed the needs-rebase label Feb 13, 2025
EC2 Default User added 5 commits February 13, 2025 08:30
Signed-off-by:  <>
p
Signed-off-by:  <>
Signed-off-by:  <>
p
Signed-off-by:  <>
p
Signed-off-by:  <>
@khluu khluu changed the title [WIP][CI] Load models in CI from S3 instead of HF [1/n][CI] Load models in CI from S3 instead of HF Feb 13, 2025
@khluu khluu marked this pull request as ready for review February 13, 2025 09:01
Comment on lines 143 to 144
AsyncEngineArgs(model="s3://vllm-ci-model-weights/distilgpt2",
load_format="runai_streamer",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not something for this PR, but could we auto-detect the correct load_format if the model starts with s3://?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is also tensorizer which can also load from S3 so I don't want to default it to runai_streamer

Comment on lines 97 to 103
("distilbert/distilgpt2", "ray", "", "L4"),
("distilbert/distilgpt2", "mp", "", "L4"),
("meta-llama/Llama-2-7b-hf", "ray", "", "L4"),
("meta-llama/Llama-2-7b-hf", "mp", "", "L4"),
("facebook/opt-125m", "ray", "", "A100"),
("facebook/opt-125m", "mp", "", "A100"),
("facebook/opt-125m", "mp", "FLASHINFER", "A100"),
("distilbert/distilgpt2", "ray", "", "A100"),
("distilbert/distilgpt2", "mp", "", "A100"),
("distilbert/distilgpt2", "mp", "FLASHINFER", "A100"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these also start with s3://?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will get converted to s3:// because it's using vllm_runner fixture, which has the check whether model is in list of MODELS_ON_S3 and if yes then it gets converted

@hmellor
Copy link
Collaborator

hmellor commented Feb 13, 2025

Instead of hard-coding S3 paths, what if we used an environment variable (VLLM_CI or something) which, if set, will prepend s3://vllm-ci-model-weights/ to the model and set load_format="runai_streamer"?

@khluu
Copy link
Collaborator Author

khluu commented Feb 13, 2025

Instead of hard-coding S3 paths, what if we used an environment variable (VLLM_CI or something) which, if set, will prepend s3://vllm-ci-model-weights/ to the model and set load_format="runai_streamer"?

hmm we can probably do it inside LLM class?

p
Signed-off-by:  <>
EC2 Default User added 2 commits February 18, 2025 05:53
@@ -92,6 +92,7 @@ def check_available_online(
# yapf: disable
_TEXT_GENERATION_EXAMPLE_MODELS = {
# [Decoder-only]
"Qwen25ForCausalLM": _HfExamplesInfo("Qwen/Qwen2.5-7B-Instruct"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model should also use Qwen2ForCausalLM. You can have additional models in each _HFExamplesInfo via extras attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

EC2 Default User added 2 commits February 18, 2025 09:35
p
Signed-off-by:  <>
Signed-off-by:  <>
@khluu khluu removed the ready ONLY add when PR is ready to merge/full CI is needed label Feb 18, 2025
p
Signed-off-by:  <>
@khluu khluu added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Feb 19, 2025
EC2 Default User added 3 commits February 19, 2025 01:10
p
Signed-off-by:  <>
p
Signed-off-by:  <>
@@ -9,7 +9,7 @@
from vllm.distributed import cleanup_dist_env_and_memory


def run_normal():
def run_normal_opt125m():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we testing both facebook/opt-125m and s3://vllm-ci-model-weights/distilgpt2 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to replace all facebook/opt-125m with distilgpt2 so I thought might as well just keep 1 test there to make sure it still works

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this too if we don't need opt125m

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mgoin

@khluu khluu requested a review from DarkLight1337 February 19, 2025 03:45
@khluu khluu requested a review from DarkLight1337 February 19, 2025 03:48
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments - PTAL!

def test_custom_executor_type_checking(model):
with pytest.raises(ValueError):
engine_args = EngineArgs(model=model,
load_format="runai_streamer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For files with multiple tests, let's not hardcode this string here if we intend to use the same load_format for all tests in the file.

Instead, let's put it as a constant at the top TEST_LOAD_FORMAT = LoadFormat.RUNAI_STREAMER

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good pt! I'm going to just swap any hardcoded string with LoadFormat... so we have it typed correctly

@@ -28,7 +30,8 @@ def test_chat():


def test_multi_chat():
llm = LLM(model="meta-llama/Llama-3.2-1B-Instruct")
llm = LLM(model=f"{MODEL_WEIGHTS_S3_BUCKET}/Llama-3.2-1B-Instruct",
load_format="runai_streamer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -171,6 +171,7 @@ def __init__(
gpu_memory_utilization: float = 0.9,
swap_space: float = 4,
cpu_offload_gb: float = 0,
load_format: Union[LoadFormat, str] = LoadFormat.AUTO,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required? llm already takes *kwargs no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya good point.. I updated it .. let's see if CI pass

Signed-off-by:  <>
@khluu khluu added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 19, 2025
@khluu khluu requested a review from ywang96 February 19, 2025 04:54
Comment on lines +30 to +31
logger = init_logger(__name__)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also revert this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea but can we do it in the next PR.. don't really want to trigger another full CI run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable to me!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 19, 2025 06:37
@DarkLight1337 DarkLight1337 merged commit d5d214a into main Feb 19, 2025
73 of 74 checks passed
@DarkLight1337 DarkLight1337 deleted the khluu/s3ci branch February 19, 2025 07:35
xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025
kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed structured-output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants