How does training and inference efficiency compare to other MLLMs? #20

rushzy · 2024-12-25T06:36:03Z

The model size of DeepSeek-VL2 is 27.5B, and the trainable parameters of lora fine-tuning are 212M, which is about 9 times that of lora fine-tuning InternVL2-8B (23M)。

Using the same data, the training time is about 20 times that of InternVL2-8B. Is this in line with expectations?

I use the swift framework for fine-tuning, and the training script is as follows

nproc_per_node=2

CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=$nproc_per_node \
swift sft \
    --model /mnt/dolphinfs/hdd_pool/docker/user/rushzy/deepseek-ai/deepseek-vl2 \
    --train_type lora \
    --dataset /mnt/dolphinfs/hdd_pool/docker/user/rushzy/data/train.jsonl \
    --num_train_epochs 3 \
    --learning_rate 8e-5 \
    --lora_rank 8 \
    --lora_alpha 12 \
    --max_length 4096 \
    --save_only_model True \
    --eval_steps 2000 \
    --save_steps 2000 \
    --save_total_limit -1 \
    --output_dir /mnt/dolphinfs/hdd_pool/docker/user/rushzy/output/test \
    --deepspeed ./ds_configs/ds_zero3_cosine.json \
    --lazy_tokenize True \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does training and inference efficiency compare to other MLLMs? #20

How does training and inference efficiency compare to other MLLMs? #20

rushzy commented Dec 25, 2024 •

edited

Loading

How does training and inference efficiency compare to other MLLMs? #20

How does training and inference efficiency compare to other MLLMs? #20

Comments

rushzy commented Dec 25, 2024 • edited Loading

rushzy commented Dec 25, 2024 •

edited

Loading