Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

全参数微调过程冻结部分参数失败 #2872

Open
xiaosongyuan opened this issue Jan 7, 2025 · 4 comments
Open

全参数微调过程冻结部分参数失败 #2872

xiaosongyuan opened this issue Jan 7, 2025 · 4 comments

Comments

@xiaosongyuan
Copy link

在对 llama3-8b-base 进行全参数微调过程中,尝试用自定义的参数矩阵元素粒度的 freeze 函数替换 swift/llm/train/tuner.py 中的 freeze_parameters() 函数。
命令行如下:
sh requirements/install_all.sh
pip install transformers==4.46.3
pip install -e .

export MKL_THREADING_LAYER=pthreads
nproc_per_node=8
NPROC_PER_NODE=$nproc_per_node
MASTER_PORT=29500
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
swift sft
--model_type llama3
--model llama3-8b/infinity_code_20k/v0-20241228-012227/checkpoint-231
--model_revision master
--train_type full
--tuner_backend peft
--template_backend swift
--system ""
--torch_dtype bfloat16
--dataset custom_datasets/sft_data/infinity_code_20k.jsonl
--dataset_num_proc 8
--output_dir ckpt_sft_seq/infinity_math_code_20k
--ddp_backend nccl
--num_train_epochs 5
--max_length 2048
--truncation_strategy delete
--packing false
--gradient_checkpointing true
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--weight_decay 0.1
--learning_rate 2e-5
--lr_scheduler_type cosine
--gradient_accumulation_steps $(expr 64 / $nproc_per_node)
--max_grad_norm 1.0
--warmup_ratio 0.03
--eval_steps 40
--save_steps 40
--save_total_limit 6
--logging_steps 1
--deepspeed 'zero3'
--save_only_model true \

但 debug 时发现在执行自定义函数或原 freeze_parameters() 函数前,model 似乎已经被 deepspeed 分片,model.named_parameters() 的 parameter_names 可以检索,但 parameter.shape 已经为 0:
image

ms-swift 是否支持在 ds 框架下的 parameter 参数矩阵行、列、或元素级别的 freeze?

@Jintao-Huang
Copy link
Collaborator

没问题的,参数在其他的卡上

@xiaosongyuan
Copy link
Author

但是当我指定将参数 model.layers.23.mlp.gate.up_proj 的第 1700 行元素的 requires_grad 置为 False,会报 1700 out of index 错误,是 freeze 函数的位置不对吗?

@Jintao-Huang
Copy link
Collaborator

不能对单独的列做freeze

@xiaosongyuan
Copy link
Author

刚试了一下,在 with deepspeed.zero.GatherParameters(param): 条件下,似乎可以更改单独的行或列的 requires_grad,但不知道是否生效。如果在 ms-swift 中可以通过自定义 AdamW 实现吗?类似通义的online-merging代码https://github.com/QwenLM/online_merging_optimizers/blob/main/online_merging/optimizers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants