Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调deepseek-vl2时遇到一些问题 #2883

Open
rushzy opened this issue Jan 8, 2025 · 2 comments
Open

微调deepseek-vl2时遇到一些问题 #2883

rushzy opened this issue Jan 8, 2025 · 2 comments

Comments

@rushzy
Copy link

rushzy commented Jan 8, 2025

1.启用ZeRO3加速时,运行到第二个训练样本时耗时极长,启用ZeRO2时无此问题
image

2.训练速度很慢,8卡A100微调27.5B的deepseek-vl2的速度比8B的InternVL2要慢40倍左右

deepseek-vl2:
image

InternVL2:
image

训练使用的script如下:

nproc_per_node=8

NPROC_PER_NODE=$nproc_per_node \
MASTER_PORT=29502 \
swift sft \
    --model /mnt/dolphinfs/hdd_pool/docker/user/wzy/huggingface.co/deepseek-ai/deepseek-vl2  \
    --train_type lora \
    --dataset /mnt/dolphinfs/hdd_pool/docker/user/wzy/data/train.jsonl \
    --num_train_epochs 3 \
    --learning_rate 8e-5 \
    --lora_rank 8 \
    --lora_alpha 12 \
    --max_length 4096 \
    --lazy_tokenize True \
    --save_only_model True \
    --eval_steps 2000 \
    --save_steps 2000 \
    --save_total_limit -1 \
    --output_dir /mnt/dolphinfs/hdd_pool/docker/user/wzy/output_wzy/test/deepseek_vl2 \
    --deepspeed /mnt/dolphinfs/hdd_pool/docker/user/wzy/deepseek_vl2/ds_configs/ds_zero3_cosine.json \
    --lazy_tokenize True \
    --per_device_train_batch_size 2 \
    --torch_dtype bfloat16 \
    --logging_steps 5 \
    --dataloader_num_workers 24 \
    
@Jintao-Huang
Copy link
Collaborator

moe模型使用transformers生态都会很慢的,建议使用megatron

@rushzy
Copy link
Author

rushzy commented Jan 9, 2025

请问目前swift支持使用megatron进行训练吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants