Improve performance when subsetting, including shapefile subsetting #316

jamesfwood · 2025-01-13T21:28:42Z

Investigate possible performance improvements to the shapefile subsetting capability. This capability is currently being brute-forced, but there might be a more elegant solution. Because this is L2 data, many of the existing capabilities for doing this may not work, such as using rioxarray.

The following library may be useful in achieving this: https://github.com/xarray-contrib/xoak
Also, see the the following discussion on this exact topic: corteva/rioxarray#202

==============================================

This came from Chris Durbin on the Harmony team when they built a tool to subset using AI tools. They found that l2ss-py was the slowest piece in their tool and could use speed improvements.

Here's some comments from Chris regarding this:

The requests that would took minutes when we provided a shapefile were l2ss-py. We tried limiting the number of points in the shapefile to a small number, but were still seeing minutes to complete. We saw maskfill requests would only take a couple seconds with a shapefile so we were wondering if it would be possible to leverage it to perform the subsetting. I asked Owen briefly and he said maskfill only works on gridded data which probably makes it much faster to work with a shapefile for subsetting. In any case it's probably worth a ticket to see if anything can be done to speed up the shapefile subsetting in l2ss-py. I know when I worked on CMR we used some tricks with using bounding boxes of the minimum bounding rectangle (MBR) and largest interior rectangle (LR) to quickly find things that must be in the region or can't be in the region, and then only performed more expensive intersection searches for points that were in the MBR but not in the LR.

jamesfwood mentioned this issue Jan 13, 2025

Improve shapefile subsetting performance #69

Closed

jamesfwood changed the title ~~Improve performance when subsetting, including shapefile subsetting, in l2ss-py~~ Improve performance when subsetting, including shapefile subsetting Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance when subsetting, including shapefile subsetting #316

Improve performance when subsetting, including shapefile subsetting #316

jamesfwood commented Jan 13, 2025 •

edited

Loading

Improve performance when subsetting, including shapefile subsetting #316

Improve performance when subsetting, including shapefile subsetting #316

Comments

jamesfwood commented Jan 13, 2025 • edited Loading

jamesfwood commented Jan 13, 2025 •

edited

Loading