[1/n][CI] Load models in CI from S3 instead of HF #13205

khluu · 2025-02-13T06:34:35Z

Load some models from S3 path with runai-model-streamer instead of HF by default (only a few test jobs so far, listed below)
Add runai-model-streamer and ...-s3 into CI dependencies
Allow to pull more files from S3 than just *config.json
Strip / from file path when determining destination file path so it doesn't default to /... which machine doesn't have access to write into
Add / into model S3 path if there's no / at the end, mainly to prevent cases where 2 models on S3 bucket match the pattern and confuse the model loader
Don't look for files in HF repo if the model starts with /, which means the model path is a S3 path and model name is already converted to S3Model().dir which looks like /tmp/tmp3123..

Test jobs with models loaded with S3 (not all test files, just as many as I can):

Entrypoints llm/ (I haven't done so for openai/ ones yet since they are set up with the remote server and the S3 model path somehow messed things up)
Basic correctness
Basic models
Metrics & Tracing
Async Engine, Inputs, Utils, Worker Test
Samplers
Engine

Signed-off-by: <>

github-actions · 2025-02-13T06:34:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-02-13T06:35:14Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @khluu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: <>

hmellor · 2025-02-13T10:45:18Z

tests/async_engine/test_async_llm_engine.py

+        AsyncEngineArgs(model="s3://vllm-ci-model-weights/distilgpt2",
+                        load_format="runai_streamer",


Perhaps not something for this PR, but could we auto-detect the correct load_format if the model starts with s3://?

there is also tensorizer which can also load from S3 so I don't want to default it to runai_streamer

hmellor · 2025-02-13T10:47:09Z

tests/basic_correctness/test_basic_correctness.py

+        ("distilbert/distilgpt2", "ray", "", "L4"),
+        ("distilbert/distilgpt2", "mp", "", "L4"),
        ("meta-llama/Llama-2-7b-hf", "ray", "", "L4"),
        ("meta-llama/Llama-2-7b-hf", "mp", "", "L4"),
-        ("facebook/opt-125m", "ray", "", "A100"),
-        ("facebook/opt-125m", "mp", "", "A100"),
-        ("facebook/opt-125m", "mp", "FLASHINFER", "A100"),
+        ("distilbert/distilgpt2", "ray", "", "A100"),
+        ("distilbert/distilgpt2", "mp", "", "A100"),
+        ("distilbert/distilgpt2", "mp", "FLASHINFER", "A100"),


Should these also start with s3://?

these will get converted to s3:// because it's using vllm_runner fixture, which has the check whether model is in list of MODELS_ON_S3 and if yes then it gets converted

hmellor · 2025-02-13T10:49:48Z

Instead of hard-coding S3 paths, what if we used an environment variable (VLLM_CI or something) which, if set, will prepend s3://vllm-ci-model-weights/ to the model and set load_format="runai_streamer"?

khluu · 2025-02-13T20:59:42Z

Instead of hard-coding S3 paths, what if we used an environment variable (VLLM_CI or something) which, if set, will prepend s3://vllm-ci-model-weights/ to the model and set load_format="runai_streamer"?

hmm we can probably do it inside LLM class?

Signed-off-by: <>

DarkLight1337 · 2025-02-18T05:58:38Z

tests/models/registry.py

@@ -92,6 +92,7 @@ def check_available_online(
 # yapf: disable
 _TEXT_GENERATION_EXAMPLE_MODELS = {
    # [Decoder-only]
+    "Qwen25ForCausalLM": _HfExamplesInfo("Qwen/Qwen2.5-7B-Instruct"),


This model should also use Qwen2ForCausalLM. You can have additional models in each _HFExamplesInfo via extras attribute.

vllm/transformers_utils/s3_utils.py

Signed-off-by: <>

DarkLight1337 · 2025-02-19T03:43:57Z

tests/entrypoints/llm/test_lazy_outlines.py

@@ -9,7 +9,7 @@
 from vllm.distributed import cleanup_dist_env_and_memory


-def run_normal():
+def run_normal_opt125m():


Why are we testing both facebook/opt-125m and s3://vllm-ci-model-weights/distilgpt2 here?

I was going to replace all facebook/opt-125m with distilgpt2 so I thought might as well just keep 1 test there to make sure it still works

I can remove this too if we don't need opt125m

vllm/model_executor/model_loader/weight_utils.py

ywang96

Left some comments - PTAL!

ywang96 · 2025-02-19T04:13:37Z

tests/engine/test_executor.py

 def test_custom_executor_type_checking(model):
    with pytest.raises(ValueError):
        engine_args = EngineArgs(model=model,
+                                 load_format="runai_streamer",


For files with multiple tests, let's not hardcode this string here if we intend to use the same load_format for all tests in the file.

Instead, let's put it as a constant at the top TEST_LOAD_FORMAT = LoadFormat.RUNAI_STREAMER

good pt! I'm going to just swap any hardcoded string with LoadFormat... so we have it typed correctly

ywang96 · 2025-02-19T04:14:27Z

tests/entrypoints/llm/test_chat.py

@@ -28,7 +30,8 @@ def test_chat():


 def test_multi_chat():
-    llm = LLM(model="meta-llama/Llama-3.2-1B-Instruct")
+    llm = LLM(model=f"{MODEL_WEIGHTS_S3_BUCKET}/Llama-3.2-1B-Instruct",
+              load_format="runai_streamer")


ywang96 · 2025-02-19T04:16:55Z

vllm/entrypoints/llm.py

@@ -171,6 +171,7 @@ def __init__(
        gpu_memory_utilization: float = 0.9,
        swap_space: float = 4,
        cpu_offload_gb: float = 0,
+        load_format: Union[LoadFormat, str] = LoadFormat.AUTO,


Is this change required? llm already takes *kwargs no?

ya good point.. I updated it .. let's see if CI pass

Signed-off-by: <>

DarkLight1337 · 2025-02-19T06:33:45Z

vllm/model_executor/model_loader/weight_utils.py

+logger = init_logger(__name__)
+


Can we also revert this change?

yea but can we do it in the next PR.. don't really want to trigger another full CI run

ywang96

I think this looks reasonable to me!

Signed-off-by: <> Co-authored-by: EC2 Default User <[email protected]>

EC2 Default User added 2 commits February 12, 2025 23:57

p

1a3f0ef

Signed-off-by: <>

p

bae2ccc

mergify bot added ci/build frontend structured-output labels Feb 13, 2025

mergify bot added the needs-rebase label Feb 13, 2025

p

3d00694

Signed-off-by: <>

khluu changed the title ~~[ci] Load models from S3 instead of HF~~ [WIP][CI] Load models from S3 instead of HF Feb 13, 2025

khluu changed the title ~~[WIP][CI] Load models from S3 instead of HF~~ [WIP][CI] Load models in CI from S3 instead of HF Feb 13, 2025

Merge branch 'main' into khluu/s3ci

a823021

mergify bot removed the needs-rebase label Feb 13, 2025

EC2 Default User added 5 commits February 13, 2025 08:30

async

6bcd918

Signed-off-by: <>

p

5f09cbd

Signed-off-by: <>

space

3d7ce92

Signed-off-by: <>

p

feca1da

Signed-off-by: <>

p

453443a

Signed-off-by: <>

khluu changed the title ~~[WIP][CI] Load models in CI from S3 instead of HF~~ [1/n][CI] Load models in CI from S3 instead of HF Feb 13, 2025

khluu marked this pull request as ready for review February 13, 2025 09:01

khluu requested review from DarkLight1337, ywang96, robertgshaw2-redhat, simon-mo and njhill as code owners February 13, 2025 09:01

hmellor reviewed Feb 13, 2025

View reviewed changes

p

84e23e4

Signed-off-by: <>

EC2 Default User added 2 commits February 18, 2025 05:53

fix

cb4d18b

Signed-off-by: <>

Merge branch 'main' into khluu/s3ci

ace867b

DarkLight1337 reviewed Feb 18, 2025

View reviewed changes

vllm/transformers_utils/s3_utils.py Outdated Show resolved Hide resolved

vllm/transformers_utils/s3_utils.py Outdated Show resolved Hide resolved

EC2 Default User added 2 commits February 18, 2025 09:35

p

398dbd2

Signed-off-by: <>

fix load format

f627f6c

Signed-off-by: <>

khluu requested review from hmellor and DarkLight1337 February 18, 2025 21:57

khluu removed the ready ONLY add when PR is ready to merge/full CI is needed label Feb 18, 2025

p

b30c48f

Signed-off-by: <>

khluu added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Feb 19, 2025

EC2 Default User added 3 commits February 19, 2025 01:10

p

744faa6

Signed-off-by: <>

p

401c286

Signed-off-by: <>

Merge branch 'main' into khluu/s3ci

3262a19

DarkLight1337 reviewed Feb 19, 2025

View reviewed changes

khluu requested a review from DarkLight1337 February 19, 2025 03:45

DarkLight1337 reviewed Feb 19, 2025

View reviewed changes

vllm/model_executor/model_loader/weight_utils.py Outdated Show resolved Hide resolved

khluu requested a review from DarkLight1337 February 19, 2025 03:48

ywang96 reviewed Feb 19, 2025

View reviewed changes

stop hardcoding runai streamer

1294986

Signed-off-by: <>

khluu added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 19, 2025

khluu requested a review from ywang96 February 19, 2025 04:54

DarkLight1337 reviewed Feb 19, 2025

View reviewed changes

ywang96 approved these changes Feb 19, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) February 19, 2025 06:37

DarkLight1337 merged commit d5d214a into main Feb 19, 2025
73 of 74 checks passed

DarkLight1337 deleted the khluu/s3ci branch February 19, 2025 07:35

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2025

[1/n][CI] Load models in CI from S3 instead of HF (vllm-project#13205)

408d1a9

Signed-off-by: <> Co-authored-by: EC2 Default User <[email protected]>

kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025

[1/n][CI] Load models in CI from S3 instead of HF (vllm-project#13205)

b84fe3a

Signed-off-by: <> Co-authored-by: EC2 Default User <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/n][CI] Load models in CI from S3 instead of HF #13205

[1/n][CI] Load models in CI from S3 instead of HF #13205

khluu commented Feb 13, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 13, 2025

mergify bot commented Feb 13, 2025

hmellor Feb 13, 2025

khluu Feb 13, 2025

hmellor Feb 13, 2025

khluu Feb 13, 2025

hmellor commented Feb 13, 2025

khluu commented Feb 13, 2025

DarkLight1337 Feb 18, 2025

khluu Feb 18, 2025

DarkLight1337 Feb 19, 2025

khluu Feb 19, 2025

khluu Feb 19, 2025

DarkLight1337 Feb 19, 2025

ywang96 left a comment

ywang96 Feb 19, 2025

khluu Feb 19, 2025

ywang96 Feb 19, 2025

khluu Feb 19, 2025

ywang96 Feb 19, 2025

khluu Feb 19, 2025

DarkLight1337 Feb 19, 2025

khluu Feb 19, 2025

DarkLight1337 Feb 19, 2025

ywang96 left a comment

		AsyncEngineArgs(model="s3://vllm-ci-model-weights/distilgpt2",
		load_format="runai_streamer",

		logger = init_logger(__name__)

[1/n][CI] Load models in CI from S3 instead of HF #13205

[1/n][CI] Load models in CI from S3 instead of HF #13205

Conversation

khluu commented Feb 13, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 13, 2025

mergify bot commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hmellor commented Feb 13, 2025

khluu commented Feb 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

khluu commented Feb 13, 2025 •

edited by github-actions bot

Loading