Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance when subsetting, including shapefile subsetting #316

Open
jamesfwood opened this issue Jan 13, 2025 · 0 comments
Open

Comments

@jamesfwood
Copy link
Collaborator

jamesfwood commented Jan 13, 2025

Investigate possible performance improvements to the shapefile subsetting capability. This capability is currently being brute-forced, but there might be a more elegant solution. Because this is L2 data, many of the existing capabilities for doing this may not work, such as using rioxarray.

The following library may be useful in achieving this: https://github.com/xarray-contrib/xoak
Also, see the the following discussion on this exact topic: corteva/rioxarray#202

==============================================

This came from Chris Durbin on the Harmony team when they built a tool to subset using AI tools. They found that l2ss-py was the slowest piece in their tool and could use speed improvements.

Here's some comments from Chris regarding this:

The requests that would took minutes when we provided a shapefile were l2ss-py. We tried limiting the number of points in the shapefile to a small number, but were still seeing minutes to complete. We saw maskfill requests would only take a couple seconds with a shapefile so we were wondering if it would be possible to leverage it to perform the subsetting. I asked Owen briefly and he said maskfill only works on gridded data which probably makes it much faster to work with a shapefile for subsetting. In any case it's probably worth a ticket to see if anything can be done to speed up the shapefile subsetting in l2ss-py. I know when I worked on CMR we used some tricks with using bounding boxes of the minimum bounding rectangle (MBR) and largest interior rectangle (LR) to quickly find things that must be in the region or can't be in the region, and then only performed more expensive intersection searches for points that were in the MBR but not in the LR.

@jamesfwood jamesfwood changed the title Improve performance when subsetting, including shapefile subsetting, in l2ss-py Improve performance when subsetting, including shapefile subsetting Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant