Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does training and inference efficiency compare to other MLLMs? #20

Open
rushzy opened this issue Dec 25, 2024 · 0 comments
Open

How does training and inference efficiency compare to other MLLMs? #20

rushzy opened this issue Dec 25, 2024 · 0 comments

Comments

@rushzy
Copy link

rushzy commented Dec 25, 2024

The model size of DeepSeek-VL2 is 27.5B, and the trainable parameters of lora fine-tuning are 212M, which is about 9 times that of lora fine-tuning InternVL2-8B (23M)。

Using the same data, the training time is about 20 times that of InternVL2-8B. Is this in line with expectations?

I use the swift framework for fine-tuning, and the training script is as follows

nproc_per_node=2

CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=$nproc_per_node \
swift sft \
    --model /mnt/dolphinfs/hdd_pool/docker/user/rushzy/deepseek-ai/deepseek-vl2 \
    --train_type lora \
    --dataset /mnt/dolphinfs/hdd_pool/docker/user/rushzy/data/train.jsonl \
    --num_train_epochs 3 \
    --learning_rate 8e-5 \
    --lora_rank 8 \
    --lora_alpha 12 \
    --max_length 4096 \
    --save_only_model True \
    --eval_steps 2000 \
    --save_steps 2000 \
    --save_total_limit -1 \
    --output_dir /mnt/dolphinfs/hdd_pool/docker/user/rushzy/output/test \
    --deepspeed ./ds_configs/ds_zero3_cosine.json \
    --lazy_tokenize True \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant