Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy rectilinear interpolator, with sparse #6006

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

fnattino
Copy link
Contributor

🚀 Pull Request

Description

Enable the rectilinear interpolator to run lazily #6002.

The original implementation makes use of a CSR matrix from scipy.sparse to represent the weights. However, this does not seem to work well with Dask arrays, since it does not adhere (completely) to the array interface. Switching to sparse arrays from sparse seems to work well, allowing the sparse matrix to be wrapped in a Dask array.

Would this be an acceptable change? On the long run, it looks like sparse will replace scipy.sparse in the PyData ecosystem..

@fnattino
Copy link
Contributor Author

fnattino commented Jun 18, 2024

Additional note on the use of sparse: this is also the library used by xESMF. The drawback is that it would add numba as a dependency, which I realise it might not be ideal..

@fnattino
Copy link
Contributor Author

Marking it as ready for review to hear your thoughts on this. What is in here for now seems to work (see e.g. code snippet below), provided that sparse is installed.

Test code snippet
import iris

from iris.analysis import RectilinearInterpolator


LATITUDE = [16., 16.1]
LONGITUDE = 226.


def get_cube():
    filename = iris.sample_data_path('E1_north_america.nc')
    return iris.load_cube(filename, 'air_temperature')


def interpolate(method):
    assert method in ('linear', 'nearest')
    cube = get_cube()
    coords = ('latitude', 'longitude')
    points = (LATITUDE, LONGITUDE)
    interpolator = RectilinearInterpolator(cube, coords, method, "mask")
    print('Cube is lazy: ', cube.has_lazy_data())
    # Cube is lazy:  True
    result = interpolator(points, collapse_scalar=True)
    print('Result is lazy: ', result.has_lazy_data())
    # Result is lazy:  True


if __name__ == '__main__':
    interpolate('linear')
    interpolate('nearest')

@fnattino fnattino marked this pull request as ready for review June 18, 2024 09:37
@trexfeathers
Copy link
Contributor

Hi @fnattino, thanks for your hard work. Just to warn you I expect several weeks' delay before we can get to this as we are accumulating a backlog of ESMValTool changes while we prepare GeoVista for SciPy 2024 and complete the mesh-focused Iris 3.10 release.

@fnattino fnattino changed the title Lazy rectilinear interpolator Lazy rectilinear interpolator, with sparse Jul 25, 2024
@fnattino
Copy link
Contributor Author

I have worked on a different approach that makes the intepolator lazy by using a similar approach as used in the regridder, see #6084. This has the disadvantage that it requires merging the chunks along the interpolating dimensions, but it is not as "disruptive" as in it does not add new dependencies, so maybe worth to have a look at #6084 first keeping this for the longer run? In the meantime I can also try to run some benchmarks to compare the performance of sparse vs scipy.sparse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants