-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/n][CI] Load models in CI from S3 instead of HF #13205
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
This pull request has merge conflicts that must be resolved before it can be |
AsyncEngineArgs(model="s3://vllm-ci-model-weights/distilgpt2", | ||
load_format="runai_streamer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not something for this PR, but could we auto-detect the correct load_format
if the model
starts with s3://
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also tensorizer
which can also load from S3 so I don't want to default it to runai_streamer
("distilbert/distilgpt2", "ray", "", "L4"), | ||
("distilbert/distilgpt2", "mp", "", "L4"), | ||
("meta-llama/Llama-2-7b-hf", "ray", "", "L4"), | ||
("meta-llama/Llama-2-7b-hf", "mp", "", "L4"), | ||
("facebook/opt-125m", "ray", "", "A100"), | ||
("facebook/opt-125m", "mp", "", "A100"), | ||
("facebook/opt-125m", "mp", "FLASHINFER", "A100"), | ||
("distilbert/distilgpt2", "ray", "", "A100"), | ||
("distilbert/distilgpt2", "mp", "", "A100"), | ||
("distilbert/distilgpt2", "mp", "FLASHINFER", "A100"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these also start with s3://
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these will get converted to s3://
because it's using vllm_runner
fixture, which has the check whether model is in list of MODELS_ON_S3
and if yes then it gets converted
Instead of hard-coding S3 paths, what if we used an environment variable ( |
hmm we can probably do it inside |
tests/models/registry.py
Outdated
@@ -92,6 +92,7 @@ def check_available_online( | |||
# yapf: disable | |||
_TEXT_GENERATION_EXAMPLE_MODELS = { | |||
# [Decoder-only] | |||
"Qwen25ForCausalLM": _HfExamplesInfo("Qwen/Qwen2.5-7B-Instruct"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model should also use Qwen2ForCausalLM
. You can have additional models in each _HFExamplesInfo
via extras
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: <>
@@ -9,7 +9,7 @@ | |||
from vllm.distributed import cleanup_dist_env_and_memory | |||
|
|||
|
|||
def run_normal(): | |||
def run_normal_opt125m(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we testing both facebook/opt-125m
and s3://vllm-ci-model-weights/distilgpt2
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to replace all facebook/opt-125m
with distilgpt2
so I thought might as well just keep 1 test there to make sure it still works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this too if we don't need opt125m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mgoin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments - PTAL!
tests/engine/test_executor.py
Outdated
def test_custom_executor_type_checking(model): | ||
with pytest.raises(ValueError): | ||
engine_args = EngineArgs(model=model, | ||
load_format="runai_streamer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For files with multiple tests, let's not hardcode this string here if we intend to use the same load_format for all tests in the file.
Instead, let's put it as a constant at the top TEST_LOAD_FORMAT = LoadFormat.RUNAI_STREAMER
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good pt! I'm going to just swap any hardcoded string with LoadFormat...
so we have it typed correctly
tests/entrypoints/llm/test_chat.py
Outdated
@@ -28,7 +30,8 @@ def test_chat(): | |||
|
|||
|
|||
def test_multi_chat(): | |||
llm = LLM(model="meta-llama/Llama-3.2-1B-Instruct") | |||
llm = LLM(model=f"{MODEL_WEIGHTS_S3_BUCKET}/Llama-3.2-1B-Instruct", | |||
load_format="runai_streamer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
vllm/entrypoints/llm.py
Outdated
@@ -171,6 +171,7 @@ def __init__( | |||
gpu_memory_utilization: float = 0.9, | |||
swap_space: float = 4, | |||
cpu_offload_gb: float = 0, | |||
load_format: Union[LoadFormat, str] = LoadFormat.AUTO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required? llm
already takes *kwargs
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya good point.. I updated it .. let's see if CI pass
Signed-off-by: <>
logger = init_logger(__name__) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also revert this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea but can we do it in the next PR.. don't really want to trigger another full CI run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable to me!
Signed-off-by: <> Co-authored-by: EC2 Default User <[email protected]>
Signed-off-by: <> Co-authored-by: EC2 Default User <[email protected]>
runai-model-streamer
instead of HF by default (only a few test jobs so far, listed below)runai-model-streamer
and...-s3
into CI dependencies*config.json
/
from file path when determining destination file path so it doesn't default to/...
which machine doesn't have access to write into/
into model S3 path if there's no/
at the end, mainly to prevent cases where 2 models on S3 bucket match the pattern and confuse the model loader/
, which means the model path is a S3 path and model name is already converted to S3Model().dir which looks like/tmp/tmp3123..
Test jobs with models loaded with S3 (not all test files, just as many as I can):
openai/
ones yet since they are set up with the remote server and the S3 model path somehow messed things up)