You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model size of DeepSeek-VL2 is 27.5B, and the trainable parameters of lora fine-tuning are 212M, which is about 9 times that of lora fine-tuning InternVL2-8B (23M)。
Using the same data, the training time is about 20 times that of InternVL2-8B. Is this in line with expectations?
I use the swift framework for fine-tuning, and the training script is as follows
The model size of DeepSeek-VL2 is 27.5B, and the trainable parameters of lora fine-tuning are 212M, which is about 9 times that of lora fine-tuning InternVL2-8B (23M)。
Using the same data, the training time is about 20 times that of InternVL2-8B. Is this in line with expectations?
I use the swift framework for fine-tuning, and the training script is as follows
The text was updated successfully, but these errors were encountered: