Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多级多卡训练qwen2-vl, #2863

Open
wolfworld6 opened this issue Jan 6, 2025 · 5 comments
Open

多级多卡训练qwen2-vl, #2863

wolfworld6 opened this issue Jan 6, 2025 · 5 comments

Comments

@wolfworld6
Copy link

两台8卡训练qwen2-vl-7B-Instruct后,为什么会有15个checkpoints?
image

@Jintao-Huang
Copy link
Collaborator

这个是存随机种子的,checkpoint是 checkpoint-766

@wolfworld6
Copy link
Author

这个是存随机种子的,checkpoint是 checkpoint-766
上面就是打开checkpoint-750的目录,checkpoint-766的目录也是如此:merge后model从00001-000031,不解
image

@Jintao-Huang
Copy link
Collaborator

adapter_model.safetensors 这个是lora增量权重

@wolfworld6
Copy link
Author

wolfworld6 commented Jan 7, 2025

adapter_model.safetensors 这个是lora增量权重
嗯嗯,想知道merge的指令是啥?试了几个都不对
export --model_type --ckpt_dir --merge_lora true,
得到了:31个model
image

@Jintao-Huang
Copy link
Collaborator

你确定你训练的是7b,而不是72b嘛

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants