[serve.llm] Remove upstreamed workarounds 1/3 #54512

eicherseiji · 2025-07-10T18:01:25Z

Why are these changes needed?

No longer needed since the behavior was aligned upstream:

[Bugfix] Register reducer even if transformers_modules not available vllm-project/vllm#19510

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

gemini-code-assist

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on code cleanup and maintenance by removing several temporary workarounds. These workarounds were originally introduced to mitigate issues in upstream libraries, but with recent fixes integrated into those dependencies, they are no longer necessary, simplifying our codebase.

Highlights

Transformers Workaround Removal: I've removed the explicit init_hf_modules() call and its import from llm_engine.py. This workaround was previously needed to prevent pickle errors when using trust_remote_code=True with Hugging Face transformers models, indicating the underlying issue has been resolved upstream.
vLLM Environment Variable Cleanup: I've eliminated the conditional deletion of the CUDA_VISIBLE_DEVICES environment variable in vllm_engine.py. This specific workaround, tracked by a TODO referencing a vLLM pull request, is now obsolete as the fix has been merged upstream.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request removes two workarounds that are no longer needed due to fixes in upstream libraries (transformers and vllm). The changes are correct and improve code maintainability.

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji · 2025-07-11T22:46:12Z

Services failing to start due to:

File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 49, in create_connector_v0\n    assert issubclass(connector_cls, KVConnectorBase)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n',)('AssertionError\n',)

But seems like we're using the V0 engine since explicit env variable was removed in #54440
https://github.com/vllm-project/vllm/blob/releases/v0.9.2/vllm/engine/arg_utils.py#L1479

INFO 07-11 14:59:26 [arg_utils.py:1746] Engine in background thread is experimental on VLLM_USE_V1=1. Falling back to V0 Engine.

Service link

But nightly tests are successful, so need to determine why these changes are causing them to fail.

Signed-off-by: Seiji Eicher <[email protected]>

github-actions · 2025-08-01T00:47:29Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji · 2025-09-02T20:00:18Z

I suspect removing the vllm_config initialization via .remote() is responsible for the release test failures. Splitting that out in a separate PR to unblock these changes.

Next PR: #56170

eicherseiji · 2025-09-02T21:27:16Z

Failing release test is jailed, will be fixed with #56104

eicherseiji · 2025-09-03T18:03:54Z

Release test may have been a true positive. If the LMCache integration relies on setting CUDA_VISIBLE_DEVICES, we can't remove the lines setting it just yet. I'll update the comment if the test passes.

Signed-off-by: Seiji Eicher <[email protected]>

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

eicherseiji added the go add ONLY when ready to merge, run all tests label Jul 10, 2025

eicherseiji changed the title ~~Remove upstreamed workarounds~~ [serve.llm] Remove upstreamed workarounds Jul 10, 2025

eicherseiji force-pushed the remove-workarounds branch from c7c24c8 to 6117217 Compare July 11, 2025 16:35

eicherseiji added 3 commits July 11, 2025 13:57

Remove upstreamed workarounds

0800da0

Signed-off-by: Seiji Eicher <[email protected]>

Can now query hardware capability from CUDA_VISIBLE_DEVICES='' PG

a4a0384

Signed-off-by: Seiji Eicher <[email protected]>

Lint

721aa17

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji force-pushed the remove-workarounds branch from 7e838f3 to 721aa17 Compare July 11, 2025 20:58

eicherseiji and others added 3 commits July 15, 2025 17:45

Merge branch 'master' into remove-workarounds

f920957

Signed-off-by: Seiji Eicher <[email protected]>

Remove in its new location

ff666e1

Signed-off-by: Seiji Eicher <[email protected]>

Merge branch 'master' into remove-workarounds

eb5a8ed

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Aug 1, 2025

Merge upstream

7945d12

Signed-off-by: Seiji Eicher <[email protected]>

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Aug 15, 2025

eicherseiji and others added 2 commits September 2, 2025 09:38

Merge branch 'master' into remove-workarounds

e078616

Defer remote config initialization removal to separate PR

0415bca

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji changed the title ~~[serve.llm] Remove upstreamed workarounds~~ [serve.llm] Remove upstreamed workarounds 1/N Sep 2, 2025

eicherseiji changed the title ~~[serve.llm] Remove upstreamed workarounds 1/N~~ [serve.llm] Remove upstreamed workarounds 1/2 Sep 2, 2025

eicherseiji marked this pull request as ready for review September 2, 2025 21:10

eicherseiji requested a review from a team as a code owner September 2, 2025 21:10

nrghosh approved these changes Sep 2, 2025

View reviewed changes

kouroshHakha approved these changes Sep 2, 2025

View reviewed changes

Merge branch 'master' into remove-workarounds

fcf6436

ray-gardener bot added the serve Ray Serve Related Issue label Sep 3, 2025

ray-gardener bot added the llm label Sep 3, 2025

Merge branch 'master' into remove-workarounds

d342b44

eicherseiji changed the title ~~[serve.llm] Remove upstreamed workarounds 1/2~~ [serve.llm] Remove upstreamed workarounds 1/3 Sep 3, 2025

CUDA_VISIBLE_DEVICES needed for triton/lmcache compatibility?

a14073e

Signed-off-by: Seiji Eicher <[email protected]>

kouroshHakha merged commit d995940 into ray-project:master Sep 3, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[serve.llm] Remove upstreamed workarounds 1/3 #54512

[serve.llm] Remove upstreamed workarounds 1/3 #54512

Uh oh!

eicherseiji commented Jul 10, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

eicherseiji commented Jul 11, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

eicherseiji commented Sep 2, 2025 •

edited

Loading

Uh oh!

eicherseiji commented Sep 2, 2025

Uh oh!

eicherseiji commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

[serve.llm] Remove upstreamed workarounds 1/3 #54512

[serve.llm] Remove upstreamed workarounds 1/3 #54512

Uh oh!

Conversation

eicherseiji commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

eicherseiji commented Jul 11, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

eicherseiji commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eicherseiji commented Sep 2, 2025

Uh oh!

eicherseiji commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

eicherseiji commented Jul 10, 2025 •

edited

Loading

eicherseiji commented Sep 2, 2025 •

edited

Loading