We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用教程中的deepseekv2 lite hf转megatron,误差如下:
layer:26, layer_in, diff: 20480, diff>1e-05:[10196/20480] diff_max:57.86997985839844 layer:26, q_proj_in, diff: 20480, diff>1e-05:[10286/20480] diff_max:8.381954193115234 layer:26, q_proj_out, diff: 30720, diff>1e-05:[15446/30720] diff_max:2.5908823013305664 layer:26, q_proj_out_weight, diff: 0, diff>1e-05:[0/6291456] diff_max:0.0 layer:26, kv_a_proj_in, diff: 20480, diff>1e-05:[10286/20480] diff_max:8.381954193115234 layer:26, kv_a_proj_out, diff: 5760, diff>1e-05:[2983/5760] diff_max:12.524017333984375 layer:26, kv_a_proj_out_weight, diff: 0, diff>1e-05:[0/1179648] diff_max:0.0 layer:26, kv_a_norm_in, diff: 5120, diff>1e-05:[2652/5120] diff_max:12.524017333984375 layer:26, kv_a_norm_out, diff: 5120, diff>1e-05:[2624/5120] diff_max:3.7540347576141357 layer:26, kv_a_norm_out_weight, diff: 0, diff>1e-05:[0/512] diff_max:0.0 layer:26, kv_b_proj_in, diff: 5120, diff>1e-05:[2624/5120] diff_max:3.7540347576141357 layer:26, kv_b_proj_out, diff: 40960, diff>1e-05:[20437/40960] diff_max:3.1056199073791504 layer:26, kv_b_proj_out_weight, diff: 0, diff>1e-05:[0/2097152] diff_max:0.0 layer:26, o_proj_in, diff: 20480, diff>1e-05:[10081/20480] diff_max:1.679904580116272 layer:26, o_proj_out, diff: 20480, diff>1e-05:[9964/20480] diff_max:3.2377185821533203 layer:26, o_proj_out_weight, diff: 0, diff>1e-05:[0/4194304] diff_max:0.0 layer:26, attn_out, diff: 20480, diff>1e-05:[9964/20480] diff_max:3.2377185821533203 layer:26, shared_experts_down_proj_in, diff: 28160, diff>1e-05:[14102/28160] diff_max:42.54461669921875 layer:26, shared_experts_down_proj_out, diff: 20480, diff>1e-05:[9978/20480] diff_max:113.819580078125 layer:26, shared_experts_down_proj_out_weight, diff: 0, diff>1e-05:[0/5767168] diff_max:0.0 layer:26, lmhead, diff: 1024000, diff>1e-05:[754661/1024000] diff_max:12.326448440551758 layer:26, lmhead_weight, diff: 0, diff>1e-05:[0/209715200] diff_max:0.0 layer:26, lmhead_token, diff: 5, diff>1e-05:[4/10] diff_max:26968 logits: 1024000, diff>1e-05:[754661/1024000] diff_max:12.326448440551758
其中layer:26, shared_experts_down_proj_out, diff: 20480, diff>1e-05:[9978/20480] diff_max:113.819580078125 竟然达到了113,确定没问题吗?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
使用教程中的deepseekv2 lite hf转megatron,误差如下:
其中layer:26, shared_experts_down_proj_out, diff: 20480, diff>1e-05:[9978/20480] diff_max:113.819580078125 竟然达到了113,确定没问题吗?
The text was updated successfully, but these errors were encountered: