can't run python web_demo.py with 8 4090 (24GB) cards

I have 8 GPUs with 24GB memory each, 
here is the code:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2" \
--port 37914 \ 

the out put is:
......
deepseek-ai/deepseek-vl2/model-00008-of-000008.safetensors
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 8/8 [00:14<00:00,  1.85s/it]
after Loading checkpoint shards: 100%, the CUDA out of memory:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.55 GiB total capacity; 17.98 GiB already allocated; 18.81 MiB free; 23.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
it seems that only one gpu is used, the CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 didn't work,
I want to know what need to change, thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

can't run python web_demo.py with 8 4090 (24GB) cards #86

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

can't run python web_demo.py with 8 4090 (24GB) cards #86

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions