Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ModuleNotFoundError: No module named 'ray' #854

Open
gizbo opened this issue Dec 2, 2024 · 4 comments
Open

[Bug]: ModuleNotFoundError: No module named 'ray' #854

gizbo opened this issue Dec 2, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@gizbo
Copy link

gizbo commented Dec 2, 2024

Your current environment

N/A

🐛 Describe the bug

Hello,
Running the provided quickstart Docker run command, and getting the following error:

INFO: Multiprocessing frontend to use
ipc:///tmp/3f2ae52b-cfde-4764-ad60-361c1c2ced18 for RPC Path.
INFO: Started engine process with PID 57
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/aphrodite/executor/ray_utils.py", line 13, in
import ray
ModuleNotFoundError: No module named 'ray'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 214, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, rpc_path)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 29, in init
self.engine = AsyncAphrodite.from_engine_args(async_engine_args)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 703, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/args_tools.py", line 936, in create_engine_config
parallel_config = ParallelConfig(
File "/usr/local/lib/python3.10/dist-packages/aphrodite/common/config.py", line 963, in init
raise ValueError("Unable to load Ray which is "
ValueError: Unable to load Ray which is required for multi-node inference, please install Ray with pip install ray.
No module named 'ray'^CTraceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/api_server.py", line 802, in
asyncio.run(run_server(args))
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/usr/lib/python3.10/asyncio/base_events.py", line 1871, in _run_once
event_list = self._selector.select(timeout)
File "/usr/lib/python3.10/selectors.py", line 469, in select
fd_event_list = self._selector.poll(timeout, max_ev)
Thanks

@gizbo gizbo added the bug Something isn't working label Dec 2, 2024
@AlpinDale
Copy link
Member

Can you share your Docker command? We should not use Ray unless you launch the engine with --worker-use-ray or --distributed-executor-backend=ray.

@gizbo
Copy link
Author

gizbo commented Dec 2, 2024

Hey, thanks for the quick reply. I've was trying the command from the README.md
Docker
Additionally, we provide a Docker image for easy deployment. Here's a basic command to get you started:

docker run --runtime nvidia --gpus all
-v ~/.cache/huggingface:/root/.cache/huggingface
#--env "CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7"
-p 2242:2242
--ipc=host
alpindale/aphrodite-openai:latest
--model NousResearch/Meta-Llama-3.1-8B-Instruct
--tensor-parallel-size 8
--api-keys "sk-empty"

@AlpinDale
Copy link
Member

Can you add --distributed-executor-backend=mp to the launch flags?

@baditaflorin
Copy link

By default getting same error

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7" \
    -p 2242:2242 \
    --ipc=host \
    alpindale/aphrodite-openai:latest \
    --model NousResearch/Meta-Llama-3.1-8B-Instruct \
    --tensor-parallel-size 8 \
    --api-keys "sk-empty"
Unable to find image 'alpindale/aphrodite-openai:latest' locally
latest: Pulling from alpindale/aphrodite-openai
3c645031de29: Pull complete
0d6448aff889: Pull complete
0a7674e3e8fe: Pull complete
b71b637b97c5: Pull complete
56dc85502937: Pull complete
c1c890480c74: Pull complete
93929e83ed21: Pull complete
0ead3d2f76c1: Pull complete
60cdee2e316d: Pull complete
518f3d7cac80: Pull complete
336c5995c4b2: Pull complete
Digest: sha256:8bac4170be255c19d29d84ffbdeabdc1b0a09ee511bec7ed0026e349db430357
Status: Downloaded newer image for alpindale/aphrodite-openai:latest
INFO:     Multiprocessing frontend to use
ipc:///tmp/535bb624-82bb-42e8-bbc7-5ea63814857e for RPC Path.
INFO:     Started engine process with PID 46
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/executor/ray_utils.py", line 13, in <module>
    import ray
ModuleNotFoundError: No module named 'ray'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 214, in run_rpc_server
    server = AsyncEngineRPCServer(async_engine_args, rpc_path)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/endpoints/openai/rpc/server.py", line 29, in __init__
    self.engine = AsyncAphrodite.from_engine_args(async_engine_args)
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/async_aphrodite.py", line 703, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/engine/args_tools.py", line 936, in create_engine_config
    parallel_config = ParallelConfig(
  File "/usr/local/lib/python3.10/dist-packages/aphrodite/common/config.py", line 963, in __init__
    raise ValueError("Unable to load Ray which is "
ValueError: Unable to load Ray which is required for multi-node inference, please install Ray with `pip install ray`.

with --distributed-executor-backend=mp it seems to work, after i set the CUDA_VISIBLE_DEVICES=0 and --tensor-parallel-size 1

docker run --runtime nvidia --gpus all     -v ~/.cache/huggingface:/root/.cache/huggingface     --env "CUDA_VISIBLE_DEVICES=0"     -p 2242:2242     --ipc=host     alpindale/aphrodite-openai:latest     --model NousResearch/Meta-Llama-3.1-8B-Instruct     --tensor-parallel-size 1     --api-keys "sk-empty" --distributed-executor-backend=mp
INFO:     Multiprocessing frontend to use
ipc:///tmp/6613166f-863d-42db-98cc-5c78ae5f00a4 for RPC Path.
INFO:     Started engine process with PID 44
WARNING:  The model has a long context length (131072). This may cause OOM
errors during the initial memory profiling phase, or result in low performance
due to small KV cache space. Consider setting --max-model-len to a smaller
value.
INFO:
--------------------------------------------------------------------------------
-----
INFO:     Initializing Aphrodite Engine (v0.6.4.post1 commit 20f11fd0) with the
following config:
INFO:     Model = 'NousResearch/Meta-Llama-3.1-8B-Instruct'
INFO:     DataType = torch.bfloat16
INFO:     Tensor Parallel Size = 1
INFO:     Pipeline Parallel Size = 1
INFO:     Disable Custom All-Reduce = False
INFO:     Context Length = 131072
INFO:     Enforce Eager Mode = False
INFO:     Prefix Caching = False
INFO:     Device = device(type='cuda')
INFO:     Guided Decoding Backend =
DecodingConfig(guided_decoding_backend='lm-format-enforcer')
INFO:
--------------------------------------------------------------------------------
-----
WARNING:  Reducing Torch parallelism from 12 threads to 1 to avoid unnecessary
CPU contention. Set OMP_NUM_THREADS in the external environment to tune this
value as needed.
INFO:     Loading model NousResearch/Meta-Llama-3.1-8B-Instruct...
INFO:     Using model weights format ['*.safetensors']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants