You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:
File "/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py", line 57, in forward
output = input.matmul(weight.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.
I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.
The text was updated successfully, but these errors were encountered:
I encountered the same problem, and here's how I solved it. modify lines 425 and 428 in the modelling_deepseek.py file and remove torch.float32, such as the following code
I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:
File "/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py", line 57, in forward
output = input.matmul(weight.t())
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.
I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.
I encountered an issue while finetune with the officially released code using the DeepSpeed. Here is the detailed error message:
It appears that the matmul operation expects the two input tensors to have the same dtype. However, in my case, one of the tensors is of dtype float and the other is of dtype BFloat16.
I am not sure if this is a bug in the DeepSpeed library or an issue with my usage. I would appreciate any assistance in resolving this issue.
The text was updated successfully, but these errors were encountered: