[QUESTION] Why and When dose matmul call different kernels？ #1023

hxdtest · 2024-08-12T10:11:10Z

hxdtest
Aug 12, 2024

I run an Megatron sft training(DP=8, PP=1, TP=1), but the speed is very slow. I tried profiling the training process and found that the matmul calls cutlass::Kernel<cutlass_75_tensorop_bf16_s1688gemm_bf16_256x128_tn_align1（batch_size=4， hidden_size=4096，sequence_length=4096）However if I make a copy and matmul their copies， the matmul call ampere_bf16_s16816gemm_bf16_128x256_ldg8_f2f_stages_64x3_tn？

hxdtest · 2024-08-12T10:11:21Z

hxdtest
Aug 12, 2024
Author

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Why and When dose matmul call different kernels？ #1023

{{title}}

Replies: 1 comment

{{title}}

Select a reply

[QUESTION] Why and When dose matmul call different kernels？ #1023

hxdtest Aug 12, 2024

Replies: 1 comment

hxdtest Aug 12, 2024 Author

hxdtest
Aug 12, 2024

hxdtest
Aug 12, 2024
Author