Skip to content

Weird C optimization killer at the matmult benchmark #651

@gabrielsferre

Description

@gabrielsferre

At matmul's benchmark main function, m.matmul, moving the CheckGC() intermediate representation command may cause a drop in performance of 36%. It seems like moving this command kills some important C optimization, but it's not clear what optimization or why this happens.

This caught my eye while implementing a compiler optimization that would reduce a basic block's number of CheckGCs by grouping them all at the end of the block.

Currently, the first basic block of m.matmul looks like this:

function m.matmul(x1, x2): x3 {
  1:
    x4 <- NewArr(0)
    CheckGC()
    x5 <- #x1
    x6 <- #x2
    x8 <- x2[1]
    x7 <- #x8
    x9, x12, x11, x10 <- ForPrep(1, x5, 1)
    jmpIf x12, 2, 5

If we move CheckGC() to the end of the block, just before jmpIf, the runtime of the benchmark increases considerably.

function m.matmul(x1, x2): x3 {
  1:
    x4 <- NewArr(0)
    x5 <- #x1
    x6 <- #x2
    RenormArr(x2, 1)
    x8 <- x2[1]
    x7 <- #x8
    x9, x12, x11, x10 <- ForPrep(1, x5, 1)
    CheckGC()
    jmpIf x12, 2, 5

Right after this block, the function enters it's first loop. I seems like moving CheckGC() is disturbing the C compiler's optimizer (I'm using GCC v.15.1.1 with the -O3 flag). Furthermore, if we change m.matmul to use a while--loop instead of a for--loop as the first loop, moving CheckGC no longer causes this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions