-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate pyTMD for tidal modeling #124
Comments
I'm turning my attention to this issue of using pyTMD for tidal elevations, perhaps clipping the data by region for quicker data retrieval. Following up on some excellent notes from Robbi here: GeoscienceAustralia/dea-notebooks#989 (comment) I have the fes14 model files (this is a nice write-up). I see the implementations here and here In an activated coastseg conda env with 3.9 python, did a When I do
baby steps .. this seems fixable...I'll look into this more tomorrow. I note that pip complains about my numpy version. I'll start from scratch with a new conda env tomorrow and troubleshoot this I think tomorrow I'll open those netcdf files up and do a deep dive on exactly how the info is packaged. Robbi notes "... so far the biggest impact on performance we've found is to clip the FES2014 tidal consituent NetCDFs to a bounding box around your study area (e.g. we clip them to Australia) rather than use the global NetCDFs out of the box (we've found it can reduce the time to load those consituents in pyTMD from minutes to just a few seconds)." We should probably do the same, but for N. America. Further, I'd like to set things up so a new set of clipped files can be quickly generated and loaded for all world regions |
Ok, I think I have a solution dependencies:
stuff for xarray
get separate regions from file
Next we need a function that reads - adapts codes from here https://github.com/GeoscienceAustralia/dea-notebooks/blob/f3d1ad36f0eb66421497c5fc5e594c1eee936675/Tools/dea_tools/climate.py#L170
Implement. This writes out 11 region files for each .nc tidal constituent
regions I'm using: |
Next, I will attempt to use these files with pytmd for a prediction. For that, I will need to figure out the env stuff I mentioned yesterday Assuming all goes well, we could prep CoastSeg to adapt this workflow
any steps I'm missing? |
Thank you for figuring this out. I want to make sure I understand what you mean for each of these steps. For step 2&3 do you mean we have a geojson file that contains the bounds that each region with tide data that we check if the transect exists within? |
There is a potential that clipping all the nc files on the fly would cause a bottle neck, if we can find a way to store them, then get only the nc files we need for the small portion of the world we are working with that would be idea. It would be a similar workflow to the global shorelines workflow (which is another feature we should mention in the TCA meeting). The only thing is I don't know if we can store the nc files somewhere online or not |
Yes the file I shared has 11 regions. What I am proposing is that, for any transect, we know which region that transect is in Simplifying this by representing each transect by a single coordinate, let's say, the end or mid point, we need a good way to query a polygon to see if a point is inside. I believe there are ways to do this using shapely, geopandas, possibly others |
The bootleneck isnt too bad for one region. It takes 1-2 mins to clip all 34 files in each region |
I think the license also would allow for storage of a modified set of files online, but I don;t think we should go this route |
That's not too bad. Quick clarification question I know we are clipping the 34 files to a single large region in the file you shared, but do we need to clip it again to the bounding box that the user draws? |
but I think my preference is local storage. Clipped files are smaller in size, will load fast. The only downside is running a script for a region that takes 1-2 mins to complete |
Once the set of files have been clipped for a particular region, that doesnt need to happen again. Those files can live in a local directory. A set of files for one region is tiny |
I think that can work, I guess I'm just worried the users won't want to download 3.1 GB of nc files. That being said they have to with all the other pre-existing tools available and its much easier than with coastsat. |
Users have to download two sets of 4.5GB of files, regardless. load tide and ocean tide. There is no way around this. Those files are under license with AVISO for each individual user. What we're proposing is that the user will store an ADDITIONAL set of files, totalling a few hundred MB for each region. 99% of users will only ever work in one region |
And that's why I think this has to be a local workflow .... users store their own files locally When we migrate to the cloud, the server is the individual user |
Oh! Thank you for clearing that up I thought we were replacing the two sets of 4.5GB of files |
In a fresh coastseg env, a
and a
next,
At first glance, nothing here seems majorly wrong. This conda env is able to execute the tide clipping functions I posted above. Ok, next I import
This may be because I'm using miniconda, not anaconda. I'm have rasterio version 1.3.6 and pyproj 3.3.1, geopandas 0.12.2, gdal 3.3.1. The error seems to come from rasterio. If I do a
but I get an error importing it |
Changing tactics a little to use
this takes a while, but it works I think. The output is
|
I now have a prototype workflow for accessing pyTMD predictions at arbitrary places, using clipped .nc files 🎉
Next we need two sets of vectors - the 11 regions, and some test points
Next I cycle through each point, find the region it is in, and get a tide estimate for that region. It is super quick!
|
I'm testing out this code and I thought I'd leave my recipe for setting up my environment for reference later.
|
@dbuscombe-usgs
|
Good news! I was able to get this code to run successfully a few times. When you ran the code did you find that some tides were bad? For example, some tides printed |
I have the scripts and files I used to test the fes2014. The zip file contains the following: Data Files
Tide Model Setup Scripts
Tide Testing Files
Recipe for the Environment
|
For the points that don't have a tide, we should set the extrapolate parameter to true. This will allow the model to "guess" the approx. tide level for that lat, lon value. |
But even this idea, does not work for every point. |
I did a bit of digging into dea_coastlines' code and few other repositories that used pyTMD and I found that they implemented the logic for the
I'm going to make a sample script that mimics what the |
Awesome! |
Upon a deeper inspection of the model_tides script I realized its basically just as effective as calling the Unfortunately, this is not the breakthrough I thought it was, but on the plus side I just learned a lot about how compute tides works |
Edit: The number of unique tides generated by this script was 18911, out of 18967 tides, maybe this isn't such a great idea. I think we should consider reducing the number of tide predictions we make. If we can group locations that are close together and at the same time then we should use the same tide prediction for them to reduce the number of tides we need to predict. Here is part of the output from the first script :
|
Hey @robbibt! |
(we are using a clipped version of .nc files. the region is clipped to just North America) |
@2320sharon another thing to explore is filtering out the number of points. Instead of passing it the location of every transect, perhaps we should group transects, find the centroid location of the group, then compute tides for just that location. We'd then assign that tide value to every transect in the group. As for grouping, the effective spatial resolution of the tide prediction is probably close to the value specified by CUTOFF, i.e. 10km. We have transects every 100m so there is quite a bit of interpolation/extrapolation going on... perhaps we should group all transects within, say 1km, and use the method I outlined above, and see how long that takes? |
One solution "The first one would involve grouping the transects together by something (distance probably), then getting the centroid of that group and using that centroid point to compute the tides." what's the second? |
I like the idea behind this solution. We group the transects together 1km (and I'll try grouping by 5 km as well), then getting the centroid of that group and using that centroid point to compute the tides. This approach is very similar to what dea_notebooks has implemented the difference being they take the centroid of the entire dataset for a single ROI. |
Sorry I misread your comment and went back to fix my response |
Cool, I think this makes sense .... |
By any chance do you have any recommendation on how to group the transects together? |
i'll investigate |
that works! |
Hi all, I'm a little late to this thread so apologies if you've already resolved this! This does sound like a very similar issue to what we encountered, and essentially the motivation behind having a separate
Number 2 was particularly problematic for our use case because using The solution for this was to only extract the constituents once per point, and then re-use them when modelling tides for every individual timestep. The key part is here: we essentially repeat/tile our extracted
Then in the final step we wrap it up nicely as a labelled Pandas dataframe to make it easier to use: This solution gives a big speed up if you have the following scenario:
It sounds like you may have far more points than us though (assuming 19,000 was the points only, not points x timesteps?). In that case, the One big upcoming update to our funcs that may be handy for you too is some work I've been doing on building in parallelisation using Dask/concurrent futures - because number of points is the bottleneck, I've been experimenting with parallelising across both different tide models and chunks of n points. This looking really promising so far: run times of computing tides for 6 different tide models, a few hundred points and 30 years of hourly tides has gone down from around 8 minutes to a minute or so. Hoping to release these improvements soon - can let you know when it's ready! |
(Although the NetCDF reading isn't as much of an issue with clipped data, one thing I'd really like to explore is whether the current approach in |
Hi @robbibt Thank you for explaining how We actually have 59 points and each point either has data for which we need to correct the tide for or has a
As you can see from a snippet of the dataframe below not all the lat/lon points had values we needed to predict the tide for every time stamp. Some lat/lon points only had values we needed to predict the tide for at 3 timestamps while other points had values at as many at 345 different timestamps we needed tide predictions for. We've also been researching how to switch our project over to Thank you again for all your help! |
I ran a comparison test against the old method we were using the method used by model tides. For 284 timestamps with a single point the old method took: 20.55s For 3 timestamps with a single point the old method took: 2.45s Clearly not having to load the tide constituents makes a larger difference the more the timestamps you have. This is going to save us so much time. Thank you again @robbibt |
Order of magnitude improvement, woo! Given for your use case you don't always necessarily have exactly the same timesteps for each point, you could probably achieve something very similar by simply grouping your observations by x/y/point, then only passing the unique groups to the constituent extraction, then joining those outputs back into the original shape of the data before passing back to I did see in another thread that you had issues installing
|
Robbi, thanks so much for your help. |
No worries - it's super cool reading through this thread, nice to see others working through a similar (incredibly niche) problem space! (If you had the capacity down the track, I'd actually love to invite you all to give a talk to DEA/Geoscience Aus about some of the awesome stuff you're doing - we have a regular internal DEA Talks seminar series focusing on exciting science/technical work being done around the world, and this would be such a great fit!) |
Robbi we'd LOVE to give a talk to DEA/Geoscience Aus! We'd also love for you to give a talk to the USGS coastal group - we are far behind, but there is a lot of energy and enthusiasm now for satellite based workflows |
Thanks for all your help Robbi, we really appreciate it and we will make sure to credit the repo when we publish the code. No worries about dea-tools we found our way around that issue. |
CoastSeg v0.0.73dev3I'm going to close this issue because we have developed a several scripts that allow the user to easily perform tidal corrections. I'll create a new issue that will integrate this new method of tide correction into coastseg. I've got a full wiki how to automatically correct tides: 3 Step Process to Automatic Tide Correction
|
@dbuscombe-usgs and @2320sharon , reading through this thread was useful (I had forgotten to flatten my x, y, and time arrays), looking forward to speaking again soon. |
Proposed Workflow
The text was updated successfully, but these errors were encountered: