Lazy rectilinear interpolator, with sparse#6006
Lazy rectilinear interpolator, with sparse#6006fnattino wants to merge 9 commits intoSciTools:mainfrom
sparse#6006Conversation
they seem to better follow the array interface
for more information, see https://pre-commit.ci
|
Additional note on the use of |
…is into lazy-rectilinearinterpolator
|
Marking it as ready for review to hear your thoughts on this. What is in here for now seems to work (see e.g. code snippet below), provided that Test code snippetimport iris
from iris.analysis import RectilinearInterpolator
LATITUDE = [16., 16.1]
LONGITUDE = 226.
def get_cube():
filename = iris.sample_data_path('E1_north_america.nc')
return iris.load_cube(filename, 'air_temperature')
def interpolate(method):
assert method in ('linear', 'nearest')
cube = get_cube()
coords = ('latitude', 'longitude')
points = (LATITUDE, LONGITUDE)
interpolator = RectilinearInterpolator(cube, coords, method, "mask")
print('Cube is lazy: ', cube.has_lazy_data())
# Cube is lazy: True
result = interpolator(points, collapse_scalar=True)
print('Result is lazy: ', result.has_lazy_data())
# Result is lazy: True
if __name__ == '__main__':
interpolate('linear')
interpolate('nearest') |
|
Hi @fnattino, thanks for your hard work. Just to warn you I expect several weeks' delay before we can get to this as we are accumulating a backlog of ESMValTool changes while we prepare GeoVista for SciPy 2024 and complete the mesh-focused Iris 3.10 release. |
sparse
|
I have worked on a different approach that makes the intepolator lazy by using a similar approach as used in the regridder, see #6084. This has the disadvantage that it requires merging the chunks along the interpolating dimensions, but it is not as "disruptive" as in it does not add new dependencies, so maybe worth to have a look at #6084 first keeping this for the longer run? In the meantime I can also try to run some benchmarks to compare the performance of |
🚀 Pull Request
Description
Enable the rectilinear interpolator to run lazily #6002.
The original implementation makes use of a CSR matrix from
scipy.sparseto represent the weights. However, this does not seem to work well with Dask arrays, since it does not adhere (completely) to the array interface. Switching to sparse arrays fromsparseseems to work well, allowing the sparse matrix to be wrapped in a Dask array.Would this be an acceptable change? On the long run, it looks like
sparsewill replacescipy.sparsein the PyData ecosystem..