VRAM requirement for full sft deepseek VL 7B #860

SinanAkkoyun · 2024-05-01T18:52:04Z

Describe the bug
How much VRAM is needed to finetune the 7b VL model?

# Experimental Environment: A100
# GPU Memory Requirement: 80GB
# Runtime: 2.5 hours
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model_type qwen1half-7b-chat \
    --dataset blossom-math-zh \
    --num_train_epochs 5 \
    --sft_type full \
    --output_dir output \
    --eval_steps 500 \

The docs say one needs 80GB for a normal 7b model, however when I try to train the DeepSeek VL 7B on the research rig with an A100 I get an OOM. When trying to split across 4 GPUs (1 A100 and 3 4090s), it does not utilize the A100 and OOMs with the 3 4090s before training can start

SinanAkkoyun changed the title ~~VRAM requirement for full sft deepseek 7B~~ VRAM requirement for full sft deepseek VL 7B May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRAM requirement for full sft deepseek VL 7B #860

VRAM requirement for full sft deepseek VL 7B #860

SinanAkkoyun commented May 1, 2024 •

edited

VRAM requirement for full sft deepseek VL 7B #860

VRAM requirement for full sft deepseek VL 7B #860

Comments

SinanAkkoyun commented May 1, 2024 • edited

SinanAkkoyun commented May 1, 2024 •

edited