Description
I have 8 GPUs with 24GB memory each,
here is the code:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python web_demo.py
--model_name "deepseek-ai/deepseek-vl2"
--port 37914 \
the out put is:
......
deepseek-ai/deepseek-vl2/model-00008-of-000008.safetensors
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████| 8/8 [00:14<00:00, 1.85s/it]
after Loading checkpoint shards: 100%, the CUDA out of memory:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 23.55 GiB total capacity; 17.98 GiB already allocated; 18.81 MiB free; 23.08 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
it seems that only one gpu is used, the CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 didn't work,
I want to know what need to change, thank you.