训练qwen1.5-moe-A2.7B-chat速度缓慢，GPU利用率低 #868

yangzhipeng1108 · 2024-05-06T05:32:26Z

CUDA_VISIBLE_DEVICES=0
python3 llm_sft.py
--model_type qwen1half-moe-a2_7b-chat
--model_id_or_path /root/yovole/qwen/Qwen1.5-MoE-A2.7B-Chat
--sft_type lora
--tuner_backend swift
--dtype AUTO
--output_dir output
--dataset dureader-robust-zh
--train_dataset_sample 10000
--num_train_epochs 1
--max_length 1024
--check_dataset_strategy warning
--lora_rank 8
--lora_alpha 32
--lora_dropout_p 0.05
--lora_target_modules ALL
--gradient_checkpointing true
--batch_size 1
--weight_decay 0.1
--learning_rate 1e-4
--gradient_accumulation_steps 16
--max_grad_norm 0.5
--warmup_ratio 0.03
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 10
--use_flash_attn true
--self_cognition_sample 1000
--custom_train_dataset_path /root/yovole/qwen/data/alpaca-gpt4-data-zh/alpaca_gpt4_data_zh.json
--custom_val_dataset_path /root/yovole/qwen/data/alpaca-gpt4-data-zh/alpaca_gpt4_data_zh.json
--model_name 卡卡罗特
--model_author 陶白白

tastelikefeet · 2024-05-09T03:57:50Z

这个慢的不正常，看下模型device，是不是有offloading到cpu的情况

yangzhipeng1108 closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

训练qwen1.5-moe-A2.7B-chat速度缓慢，GPU利用率低 #868

训练qwen1.5-moe-A2.7B-chat速度缓慢，GPU利用率低 #868

yangzhipeng1108 commented May 6, 2024 •

edited

tastelikefeet commented May 9, 2024

训练qwen1.5-moe-A2.7B-chat速度缓慢，GPU利用率低 #868

训练qwen1.5-moe-A2.7B-chat速度缓慢，GPU利用率低 #868

Comments

yangzhipeng1108 commented May 6, 2024 • edited

tastelikefeet commented May 9, 2024

yangzhipeng1108 commented May 6, 2024 •

edited