Skip to content

can't run python web_demo.py with 8 4090 (24GB) cards #86

Closed
@fengyue20

Description

@fengyue20

I have 8 GPUs with 24GB memory each,
here is the code:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python web_demo.py
--model_name "deepseek-ai/deepseek-vl2"
--port 37914 \

the out put is:
......
deepseek-ai/deepseek-vl2/model-00008-of-000008.safetensors
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 8/8 [00:14<00:00, 1.85s/it]
after Loading checkpoint shards: 100%, the CUDA out of memory:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.55 GiB total capacity; 17.98 GiB already allocated; 18.81 MiB free; 23.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
it seems that only one gpu is used, the CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 didn't work,
I want to know what need to change, thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions