Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-VL SFT 训练疑惑 #2853

Open
rover5056 opened this issue Jan 3, 2025 · 4 comments
Open

Qwen2-VL SFT 训练疑惑 #2853

rover5056 opened this issue Jan 3, 2025 · 4 comments

Comments

@rover5056
Copy link

在训练 Qwen2-VL 或者其他 MLLM 模型的时候,如果有图文混合数据和纯文本数据
求问下,在纯文本的 batch 的时候,ms-swift 会更新 Visual 模块么,VIT 或者 Projector 这两个地方的权重。。。

@Jintao-Huang
Copy link
Collaborator

Jintao-Huang commented Jan 4, 2025

@rover5056
Copy link
Author

我的意思是 ,在全部打开的情况下,设置为全部参数都训练的时候,同时用图文混合数据+文本数据训练

那么在纯文本的数据部分,梯度会回传到 vit 上么。vit 会在这些数据上更新不?

@Jintao-Huang
Copy link
Collaborator

不更新的

@tbozhong
Copy link

tbozhong commented Jan 6, 2025

纯文本数据的话,并没有激活vit和projector,所以vit和projector的参数不在计算图中,那么梯度并不会回传到vit上吧?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants