-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How partition_avx512
is auto-tuned?
#2101
Comments
This is based on tuning results on x86 CPUs so you can change to whichever better for the riscv processor you're optimizing for. |
Yes: previously we tune something like https://github.com/pytorch/FBGEMM/pull/82/files (this is for avx2). You can adjust for your customized HW. |
Could you provide the reasons in |
I am currently learning to port the fp16 multiplication of fbgemm to riscv. I found that the fp16 gemm
uses cblas_gemm_compute
, which usespartition_avx512
to partition themb_max=120
rows of matrix A into several tiles. e.g., 49 rows is mapping to 3 x 13rows + 1 x 10rows. It could be 4 x 12rows + 1 x 1row? From my perspective, these data seem to be irregular.The text was updated successfully, but these errors were encountered: