How `partition_avx512` is auto-tuned? #2101

condy0919 · 2023-10-28T07:27:41Z

I am currently learning to port the fp16 multiplication of fbgemm to riscv. I found that the fp16 gemm uses cblas_gemm_compute, which uses partition_avx512 to partition the mb_max=120 rows of matrix A into several tiles. e.g., 49 rows is mapping to 3 x 13rows + 1 x 10rows. It could be 4 x 12rows + 1 x 1row? From my perspective, these data seem to be irregular.

partition_array_t partition_avx512 = {
  // NOTE: clang-format wants to use a different formatting but the current
  // formatting should be easier to read.
  {
// ...
    {{ { 12, 3 }, { 11, 1 } } }, // 47
    {{ { 12, 4 }, { 0, 0 } } }, // 48
    {{ { 13, 3 }, { 10, 1 } } }, // 49
    {{ { 13, 3 }, { 11, 1 } } }, // 50
    {{ { 13, 3 }, { 12, 1 } } }, // 51
// ...

The text was updated successfully, but these errors were encountered:

jspark1105 · 2023-10-31T22:56:05Z

This is based on tuning results on x86 CPUs so you can change to whichever better for the riscv processor you're optimizing for.

jianyuh · 2023-10-31T23:02:26Z

Yes: previously we tune something like https://github.com/pytorch/FBGEMM/pull/82/files (this is for avx2). You can adjust for your customized HW.

condy0919 · 2023-11-02T11:47:26Z

Basically, this Diff switches the register layout in C accumulation buffer inside micro-kernel from MR * 1 to MR * 2. Check the reasons in T40816746.

Could you provide the reasons in T40816746?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How `partition_avx512` is auto-tuned? #2101

How `partition_avx512` is auto-tuned? #2101

condy0919 commented Oct 28, 2023 •

edited

jspark1105 commented Oct 31, 2023

jianyuh commented Oct 31, 2023

condy0919 commented Nov 2, 2023

How partition_avx512 is auto-tuned? #2101

How partition_avx512 is auto-tuned? #2101

Comments

condy0919 commented Oct 28, 2023 • edited

jspark1105 commented Oct 31, 2023

jianyuh commented Oct 31, 2023

condy0919 commented Nov 2, 2023

How `partition_avx512` is auto-tuned? #2101

How `partition_avx512` is auto-tuned? #2101

condy0919 commented Oct 28, 2023 •

edited