Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch processing cupyx.scipy.interpolate.cubicspline #8292

Open
dashanxf opened this issue Apr 15, 2024 · 6 comments
Open

batch processing cupyx.scipy.interpolate.cubicspline #8292

dashanxf opened this issue Apr 15, 2024 · 6 comments
Labels
cat:performance Performance in terms of speed or memory consumption

Comments

@dashanxf
Copy link

dashanxf commented Apr 15, 2024

Description

Dear cupy developing team,
I have some trouble using the cubic spline feature which is mentioned in the documentation. It seems when I install the cupy library the cubic spline part is not included. I tried to copy the version on github:https://github.com/cupy/cupy/blob/main/cupyx/scipy/interpolate/_cubic.py#L512 to my local python folder then import the library. It can run for 1D (x,y) array for me. But when I try to run it for a 2D y matrix with axis = 1, the function takes super long time to run. For example:
x = cp.arange(2048)
y = cp.random.rand(8192,2048)
interp = CubicSpline(x,y,axis=1)
It will take about one minute for this code to run. When I was using pchip interpolation provided by cupy, it only takes less than 2 seconds to complete. I took a deeper look and it seems like the cupyx.scipy.sparse.linalg.spsolve function that solve the linear system Ax = b is taking most of the time to complete. It seems like its basically iterating over the 8192 1D y-array of size 2048 and solve for the spline coefficients one by one. For my application, I have to use the cubic spline for interpolation. Is there any suggestion that I can improve the performance?

Additional Information

No response

@dashanxf dashanxf added the cat:feature New features/APIs label Apr 15, 2024
@kmaehashi kmaehashi added cat:performance Performance in terms of speed or memory consumption and removed cat:feature New features/APIs labels Apr 16, 2024
@takagi
Copy link
Member

takagi commented Apr 16, 2024

I could reproduce this behavior locally. Is this inevitable for the current implementation? Any suggestion for improvement, @ev-br?

@ev-br
Copy link
Contributor

ev-br commented Apr 16, 2024

The current spsolve implementation uses a python loop over the 2nd axis of the r.h.s.: https://github.com/cupy/cupy/blob/main/cupyx/scipy/sparse/linalg/_solve.py#L518

Possible ways to improve:

  • push the loop to the C/Cython level.
  • in make_interp_spline, use SuperLU instead of csrlsvqr. Not sure if that would mean a scipy dependency and/or device-host data transfers?
  • in make_interp_spline, use the _qr_solve kernel from make_lsq_spline instead of spsolve. The kernel is sequential on GPU ATM, and is not as well crafted/tested as csrlsvqr (obviously).

@dashanxf
Copy link
Author

From what I have researched, my thought toward this behavior is when looping through the 2nd axis of r.h.s we perform QR decomposition to matrix A each time we do csrlsvqr which will be O(n^3) time complexity. But for cubic spline, the matrix A should always be the same for all res[:, j] = cupyx.cusolver.csrlsvqr(A, b[:, j]). So we actually do not need to perform QR decomposition each time we solve for res[:,j]. If my guess is somehow correct, is there a way to cache the QR decomposition result then each time we simply solve Rx = Q^T b[:,j]?

@ev-br
Copy link
Contributor

ev-br commented Apr 16, 2024

THis is basically what the _qr_solve (the third option above) does under the hood: https://github.com/cupy/cupy/blob/main/cupyx/scipy/interpolate/_bspline2.py#L688

@ev-br
Copy link
Contributor

ev-br commented Apr 19, 2024

Why closing as completed? make_interp_spline is still using a python loop, only make_lsq_spline invokes _qr_solve ATM

@dashanxf
Copy link
Author

I update my local version of cubicspline to call make_lsq_spline instead of make_interp_spline. The performance is way better than before and the result looks quite satisfying. Although the processing time is still not comparable to Matlab version of cubicspline, it is good enough for me ATM. I can reopen it. Sorry for the confusion

@dashanxf dashanxf reopened this Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:performance Performance in terms of speed or memory consumption
Projects
None yet
Development

No branches or pull requests

4 participants