Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch performant discover_nhdplus_id #417

Open
mhweber opened this issue Dec 16, 2024 · 3 comments
Open

Batch performant discover_nhdplus_id #417

mhweber opened this issue Dec 16, 2024 · 3 comments

Comments

@mhweber
Copy link
Contributor

mhweber commented Dec 16, 2024

Currently the StreamCatTools sc_get_comid() function is calling discover_nhdplus_id to derive NHDPlus COMIDs for sets of lat and lons values. A number of users have recently been trying to speed this up through parallelizing or sending batch requests that exceed server limit in underlying NLDI service.

StreamCatTools has a similar function called lc_get_comid which calls nhdplusTools get_waterbodies and pulls NHDPlus waterbody COMIDs from the subset features.

Would calling NHDPlus subset service directly or via nhdplusTools be more performant and robust than discover_nhdplus_id for large calls to derive COMIDs for a large set of lat and lons?

@dblodgett-usgs
Copy link
Collaborator

Thanks for prompting this, @mhweber -- I've run into this use case a few times where people have long lists and end up using patterns that don't scale well. I'll look at an alternate discover_nhdplus_id implementation and put some thought into whether there is a faster way to do it via geoserver services.

@dblodgett-usgs
Copy link
Collaborator

I just merged a change that will help a bit. Will leave this open and think about whether there's a more significant update where we could do a spatial join remotely to retrieve comids.

@DEQathomps
Copy link

This is timely. I'm really interested in using nhdplusTools for watershed delineations. Will you please explain how to batch process? Following the code from the vignette, this function works great when dealing with a single station (only first lines of code presented for simplicity):

start_point <- st_sfc(st_point(c(-122.802489389074, 43.85780225517)), crs = 4269)
start_comid <- discover_nhdplus_id(start_point)

However, processing multiple stations at once results in errors, server timeouts, etc. Example code below.

lon2<-c(-122.802489389074, -122.691787093599)
lat2<-c(43.85780225517, 43.9239837521485)
start_points <- st_sfc(st_point(c(lon2, lat2)), crs = 4269)
start_comids <- discover_nhdplus_id(start_points)

I run into similar complications with other steps (e.g., flowlines, catchments). Have tried multiple approaches and have the most recent package installed.

Has anyone processed multiple stations simultaneously or been able to create batch watershed delineations? Any tips would be much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants