Skip to content

Conversation

eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Jul 10, 2025

Why are these changes needed?

No longer needed since the behavior was aligned upstream:

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on code cleanup and maintenance by removing several temporary workarounds. These workarounds were originally introduced to mitigate issues in upstream libraries, but with recent fixes integrated into those dependencies, they are no longer necessary, simplifying our codebase.

Highlights

  • Transformers Workaround Removal: I've removed the explicit init_hf_modules() call and its import from llm_engine.py. This workaround was previously needed to prevent pickle errors when using trust_remote_code=True with Hugging Face transformers models, indicating the underlying issue has been resolved upstream.
  • vLLM Environment Variable Cleanup: I've eliminated the conditional deletion of the CUDA_VISIBLE_DEVICES environment variable in vllm_engine.py. This specific workaround, tracked by a TODO referencing a vLLM pull request, is now obsolete as the fix has been merged upstream.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes two workarounds that are no longer needed due to fixes in upstream libraries (transformers and vllm). The changes are correct and improve code maintainability.

@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jul 10, 2025
@eicherseiji eicherseiji changed the title Remove upstreamed workarounds [serve.llm] Remove upstreamed workarounds Jul 10, 2025
@eicherseiji
Copy link
Contributor Author

Services failing to start due to:

File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 49, in create_connector_v0\n    assert issubclass(connector_cls, KVConnectorBase)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n',)('AssertionError\n',)

But seems like we're using the V0 engine since explicit env variable was removed in #54440
https://github.com/vllm-project/vllm/blob/releases/v0.9.2/vllm/engine/arg_utils.py#L1479

INFO 07-11 14:59:26 [arg_utils.py:1746] Engine in background thread is experimental on VLLM_USE_V1=1. Falling back to V0 Engine.

Service link

But nightly tests are successful, so need to determine why these changes are causing them to fail.

Copy link

github-actions bot commented Aug 1, 2025

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Aug 1, 2025
Signed-off-by: Seiji Eicher <[email protected]>
@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Aug 15, 2025
@eicherseiji eicherseiji changed the title [serve.llm] Remove upstreamed workarounds [serve.llm] Remove upstreamed workarounds 1/N Sep 2, 2025
@eicherseiji
Copy link
Contributor Author

eicherseiji commented Sep 2, 2025

I suspect removing the vllm_config initialization via .remote() is responsible for the release test failures. Splitting that out in a separate PR to unblock these changes.

Next PR: #56170

@eicherseiji eicherseiji changed the title [serve.llm] Remove upstreamed workarounds 1/N [serve.llm] Remove upstreamed workarounds 1/2 Sep 2, 2025
@eicherseiji eicherseiji marked this pull request as ready for review September 2, 2025 21:10
@eicherseiji eicherseiji requested a review from a team as a code owner September 2, 2025 21:10
@eicherseiji
Copy link
Contributor Author

Failing release test is jailed, will be fixed with #56104

Screenshot 2025-09-02 at 2 26 39 PM

@ray-gardener ray-gardener bot added the serve Ray Serve Related Issue label Sep 3, 2025
@ray-gardener ray-gardener bot added the llm label Sep 3, 2025
@eicherseiji eicherseiji changed the title [serve.llm] Remove upstreamed workarounds 1/2 [serve.llm] Remove upstreamed workarounds 1/3 Sep 3, 2025
@eicherseiji
Copy link
Contributor Author

Release test may have been a true positive. If the LMCache integration relies on setting CUDA_VISIBLE_DEVICES, we can't remove the lines setting it just yet. I'll update the comment if the test passes.

@kouroshHakha kouroshHakha merged commit d995940 into ray-project:master Sep 3, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests llm serve Ray Serve Related Issue unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants