Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiating affine expressions with large matrices is slow #452

Open
Red-Portal opened this issue Feb 1, 2025 · 1 comment
Open
Labels
enhancement (performance) Would reduce the time it takes to run some bit of the code high priority

Comments

@Red-Portal
Copy link

Hi,

I noticed that mooncake is significantly slower for simple affine expressions that involve large matrices. For instance:

n = 1000
A = LowerTriangular(randn(n, n))
b = randn(n)

function f(x) 
    y = A * x + b
    sum(abs2, y)
end

For this function, Mooncake is more than an order of magnitude slower than Zygote, which is a bit surprising to me:

julia> prep = prepare_gradient(f, AutoMooncake(; config=nothing), randn(1000))
       @benchmark DifferentiationInterface.value_and_gradient(f, prep, AutoMooncake(; config=nothing), randn(1000))
BenchmarkTools.Trial: 499 samples with 1 evaluation per sample.
 Range (min  max):   8.292 ms  15.307 ms  ┊ GC (min  max): 0.00%  10.96%
 Time  (median):      9.601 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.017 ms ±  1.128 ms  ┊ GC (mean ± σ):  1.36% ±  4.24%

           ▁▁▂█▅▁▃                                             
  ▂▁▁▂▃▄▄▄▆███████▅▅▃▄▃▂▂▁▂▃▃▂▃▃▃▄▃▄▃▃▂▃▃▂▃▃▂▂▃▂▃▃▂▁▁▃▂▁▁▁▁▁▂ ▃
  8.29 ms         Histogram: frequency by time          14 ms <

 Memory estimate: 15.34 MiB, allocs estimate: 108.
julia> prep = prepare_gradient(f, AutoZygote(), randn(1000))
       @benchmark DifferentiationInterface.value_and_gradient(f, prep, AutoZygote(), randn(1000))
BenchmarkTools.Trial: 7491 samples with 1 evaluation per sample.
 Range (min  max):  490.286 μs    3.431 ms  ┊ GC (min  max): 0.00%  75.51%
 Time  (median):     604.661 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   664.965 μs ± 262.917 μs  ┊ GC (mean ± σ):  7.18% ± 12.41%

     ▅█▆▄▃▃▃▁                                                   ▁
  ▃▅▇████████▇█▆▆▆▆▃▄▅▅▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▇▆▇██▇▇█▇▆ █
  490 μs        Histogram: log(frequency) by time       1.96 ms <

 Memory estimate: 7.67 MiB, allocs estimate: 48.
@willtebbutt
Copy link
Member

willtebbutt commented Feb 1, 2025

Thanks for opening this issue -- I've just done a quick benchmark locally, and it looks like my rule for trmv! is actually the culprit -- it appears to be really quite slow, even thought it's type-stable. I'll take a proper look tomorrow.

@willtebbutt willtebbutt added enhancement (performance) Would reduce the time it takes to run some bit of the code high priority labels Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement (performance) Would reduce the time it takes to run some bit of the code high priority
Projects
None yet
Development

No branches or pull requests

2 participants