We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.启用ZeRO3加速时,运行到第二个训练样本时耗时极长,启用ZeRO2时无此问题
2.训练速度很慢,8卡A100微调27.5B的deepseek-vl2的速度比8B的InternVL2要慢40倍左右
deepseek-vl2:
InternVL2:
训练使用的script如下:
nproc_per_node=8 NPROC_PER_NODE=$nproc_per_node \ MASTER_PORT=29502 \ swift sft \ --model /mnt/dolphinfs/hdd_pool/docker/user/wzy/huggingface.co/deepseek-ai/deepseek-vl2 \ --train_type lora \ --dataset /mnt/dolphinfs/hdd_pool/docker/user/wzy/data/train.jsonl \ --num_train_epochs 3 \ --learning_rate 8e-5 \ --lora_rank 8 \ --lora_alpha 12 \ --max_length 4096 \ --lazy_tokenize True \ --save_only_model True \ --eval_steps 2000 \ --save_steps 2000 \ --save_total_limit -1 \ --output_dir /mnt/dolphinfs/hdd_pool/docker/user/wzy/output_wzy/test/deepseek_vl2 \ --deepspeed /mnt/dolphinfs/hdd_pool/docker/user/wzy/deepseek_vl2/ds_configs/ds_zero3_cosine.json \ --lazy_tokenize True \ --per_device_train_batch_size 2 \ --torch_dtype bfloat16 \ --logging_steps 5 \ --dataloader_num_workers 24 \
The text was updated successfully, but these errors were encountered:
moe模型使用transformers生态都会很慢的,建议使用megatron
Sorry, something went wrong.
请问目前swift支持使用megatron进行训练吗
No branches or pull requests
1.启用ZeRO3加速时,运行到第二个训练样本时耗时极长,启用ZeRO2时无此问题
2.训练速度很慢,8卡A100微调27.5B的deepseek-vl2的速度比8B的InternVL2要慢40倍左右
deepseek-vl2:
InternVL2:
训练使用的script如下:
The text was updated successfully, but these errors were encountered: