You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigate possible performance improvements to the shapefile subsetting capability. This capability is currently being brute-forced, but there might be a more elegant solution. Because this is L2 data, many of the existing capabilities for doing this may not work, such as using rioxarray.
This came from Chris Durbin on the Harmony team when they built a tool to subset using AI tools. They found that l2ss-py was the slowest piece in their tool and could use speed improvements.
Here's some comments from Chris regarding this:
The requests that would took minutes when we provided a shapefile were l2ss-py. We tried limiting the number of points in the shapefile to a small number, but were still seeing minutes to complete. We saw maskfill requests would only take a couple seconds with a shapefile so we were wondering if it would be possible to leverage it to perform the subsetting. I asked Owen briefly and he said maskfill only works on gridded data which probably makes it much faster to work with a shapefile for subsetting. In any case it's probably worth a ticket to see if anything can be done to speed up the shapefile subsetting in l2ss-py. I know when I worked on CMR we used some tricks with using bounding boxes of the minimum bounding rectangle (MBR) and largest interior rectangle (LR) to quickly find things that must be in the region or can't be in the region, and then only performed more expensive intersection searches for points that were in the MBR but not in the LR.
The text was updated successfully, but these errors were encountered:
jamesfwood
changed the title
Improve performance when subsetting, including shapefile subsetting, in l2ss-py
Improve performance when subsetting, including shapefile subsetting
Jan 13, 2025
Investigate possible performance improvements to the shapefile subsetting capability. This capability is currently being brute-forced, but there might be a more elegant solution. Because this is L2 data, many of the existing capabilities for doing this may not work, such as using rioxarray.
The following library may be useful in achieving this: https://github.com/xarray-contrib/xoak
Also, see the the following discussion on this exact topic: corteva/rioxarray#202
==============================================
This came from Chris Durbin on the Harmony team when they built a tool to subset using AI tools. They found that l2ss-py was the slowest piece in their tool and could use speed improvements.
Here's some comments from Chris regarding this:
The requests that would took minutes when we provided a shapefile were l2ss-py. We tried limiting the number of points in the shapefile to a small number, but were still seeing minutes to complete. We saw maskfill requests would only take a couple seconds with a shapefile so we were wondering if it would be possible to leverage it to perform the subsetting. I asked Owen briefly and he said maskfill only works on gridded data which probably makes it much faster to work with a shapefile for subsetting. In any case it's probably worth a ticket to see if anything can be done to speed up the shapefile subsetting in l2ss-py. I know when I worked on CMR we used some tricks with using bounding boxes of the minimum bounding rectangle (MBR) and largest interior rectangle (LR) to quickly find things that must be in the region or can't be in the region, and then only performed more expensive intersection searches for points that were in the MBR but not in the LR.
The text was updated successfully, but these errors were encountered: