-
Notifications
You must be signed in to change notification settings - Fork 33
Description
At matmul's benchmark main function, m.matmul
, moving the CheckGC()
intermediate representation command may cause a drop in performance of 36%. It seems like moving this command kills some important C optimization, but it's not clear what optimization or why this happens.
This caught my eye while implementing a compiler optimization that would reduce a basic block's number of CheckGCs by grouping them all at the end of the block.
Currently, the first basic block of m.matmul
looks like this:
function m.matmul(x1, x2): x3 {
1:
x4 <- NewArr(0)
CheckGC()
x5 <- #x1
x6 <- #x2
x8 <- x2[1]
x7 <- #x8
x9, x12, x11, x10 <- ForPrep(1, x5, 1)
jmpIf x12, 2, 5
If we move CheckGC()
to the end of the block, just before jmpIf
, the runtime of the benchmark increases considerably.
function m.matmul(x1, x2): x3 {
1:
x4 <- NewArr(0)
x5 <- #x1
x6 <- #x2
RenormArr(x2, 1)
x8 <- x2[1]
x7 <- #x8
x9, x12, x11, x10 <- ForPrep(1, x5, 1)
CheckGC()
jmpIf x12, 2, 5
Right after this block, the function enters it's first loop. I seems like moving CheckGC()
is disturbing the C compiler's optimizer (I'm using GCC v.15.1.1 with the -O3
flag). Furthermore, if we change m.matmul
to use a while--loop instead of a for--loop as the first loop, moving CheckGC no longer causes this problem.