Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate pyTMD for tidal modeling #124

Closed
2320sharon opened this issue Apr 12, 2023 · 81 comments
Closed

Investigate pyTMD for tidal modeling #124

2320sharon opened this issue Apr 12, 2023 · 81 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@2320sharon
Copy link
Collaborator

Proposed Workflow

  1. check if a transect is inside the fes14 model domain https://pytmd.readthedocs.io/en/latest/api_reference/check_tide_points.html
  2. check if the tide files exist and if not, download them https://github.com/tsutterley/pyTMD/blob/main/scripts/aviso_fes_tides.py
  3. use https://pytmd.readthedocs.io/en/latest/api_reference/compute_tidal_elevations.html to generate tidal elevation data, parse that to a csv format that CoastSat expects
  4. use the coastsat functions to correct shorelines for that transect
  5. repeat for each transect
@2320sharon 2320sharon added enhancement New feature or request help wanted Extra attention is needed labels Apr 12, 2023
@2320sharon 2320sharon self-assigned this Apr 12, 2023
@dbuscombe-usgs
Copy link
Member

dbuscombe-usgs commented Apr 26, 2023

I'm turning my attention to this issue of using pyTMD for tidal elevations, perhaps clipping the data by region for quicker data retrieval. Following up on some excellent notes from Robbi here: GeoscienceAustralia/dea-notebooks#989 (comment)

I have the fes14 model files (this is a nice write-up). I see the implementations here and here

In an activated coastseg conda env with 3.9 python, did a pip install dea-tools and pip install pytmd, then a conda install pyfes -c fbriol.

When I do from dea_tools.coastal import model_tides, I get an error related to pyproj (?)

## -- End pasted text --
ERROR 1: PROJ: internal_proj_create_from_database: /home/marda/miniconda3/envs/coastseg/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 1 whereas a number >= 2 is expected. It comes from another PROJ installation.
Unexpected exception formatting exception. Falling back to standard exception
Traceback (most recent call last):
  File "rasterio/crs.pyx", line 586, in rasterio.crs.CRS.from_epsg
  File "rasterio/_err.pyx", line 195, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError: PROJ: internal_proj_create_from_database: /home/marda/miniconda3/envs/coastseg/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 1 whereas a number >= 2 is expected. It comes from another PROJ installation.

baby steps .. this seems fixable...I'll look into this more tomorrow. I note that pip complains about my numpy version. I'll start from scratch with a new conda env tomorrow and troubleshoot this

I think tomorrow I'll open those netcdf files up and do a deep dive on exactly how the info is packaged. Robbi notes "... so far the biggest impact on performance we've found is to clip the FES2014 tidal consituent NetCDFs to a bounding box around your study area (e.g. we clip them to Australia) rather than use the global NetCDFs out of the box (we've found it can reduce the time to load those consituents in pyTMD from minutes to just a few seconds)."

We should probably do the same, but for N. America. Further, I'd like to set things up so a new set of clipped files can be quickly generated and loaded for all world regions

@dbuscombe-usgs
Copy link
Member

Ok, I think I have a solution

dependencies:

import xarray as xr
import json
import numpy as np
from glob import glob 

stuff for xarray

dtype = 'float64'
chunksize = ("auto", "auto")

get separate regions from file

regions_file = 'coastseg_tideregions_map.geojson'

with open(regions_file) as f:
    gj = json.load(f)
features = gj['features']
geometries = []
for f in features:
    geometries.append(f['geometry'])

print(f"{len(geometries)} tide regions")

Next we need a function that reads - adapts codes from here https://github.com/GeoscienceAustralia/dea-notebooks/blob/f3d1ad36f0eb66421497c5fc5e594c1eee936675/Tools/dea_tools/climate.py#L170

def clip_and_write_new_nc_files(files):
    for counter, k in enumerate(files):  
        print(f" working on variable {counter}")
        ds_disk = xr.open_dataset(k)

        ## cycle through each region
        for inner_counter, region in enumerate(geometries):
            print(f" working on region {inner_counter}")

            lon = np.array(region['coordinates'][0])[:,0]
            lat = np.array(region['coordinates'][0])[:,1]

            if min(lon) < 0:
                # re-order along longitude to go from -180 to 180
                ds_disk = ds_disk.assign_coords({"lon": (((ds_disk.lon + 180) % 360) - 180)})
                ds_disk = ds_disk.reindex({ "lon": np.sort(ds_disk.lon)})
                lon = np.array(region['coordinates'][0])[:,0]

            # Find existing coords between min&max
            lats = ds_disk.lat[np.logical_and(
                ds_disk.lat >= min(lat), ds_disk.lat <= max(lat))].values
            
            # If there was nothing between, just plan to grab closest
            if len(lats) == 0:
                lats = np.unique(ds_disk.lat.sel(lat=np.array(lat), method="nearest"))

            lons = ds_disk.lon[np.logical_and(
                ds_disk.lon >= min(lon), ds_disk.lon <= max(lon))].values
            
            if len(lons) == 0:
                lons = np.unique(ds_disk.lon.sel(lon=np.array(lon), method="nearest"))
            
            # crop and keep attrs
            output = ds_disk.sel(lat=lats, lon=lons)
            output.attrs = ds_disk.attrs
            for var in output.data_vars:
                output[var].attrs = ds_disk[var].attrs

            output.to_netcdf(path=k.replace('.nc',f'_clipped_region{inner_counter}.nc'), mode='w', format=None, group=None, engine=None, encoding=None, unlimited_dims=None, compute=True, invalid_netcdf=False)

Implement. This writes out 11 region files for each .nc tidal constituent

## get the load tide nc files
load_tide_nc_files = glob('tide_models/fes2014/load_tide/*.nc')

clip_and_write_new_nc_files(load_tide_nc_files)

## get the ocean tide nc files
ocean_tide_nc_files = glob('tide_models/fes2014/ocean_tide/*.nc')

clip_and_write_new_nc_files(ocean_tide_nc_files)

regions I'm using:

coastseg_tideregions_map.zip

@dbuscombe-usgs
Copy link
Member

Next, I will attempt to use these files with pytmd for a prediction. For that, I will need to figure out the env stuff I mentioned yesterday

Assuming all goes well, we could prep CoastSeg to adapt this workflow

  1. conda env
  2. add functionality to read geojson tidal regions file
  3. add functionality to query that file to see what region a transect is in
  4. now, a decision: do we clip the nc files on the fly, or store all the clipped nc files on disk? the unclipped 'load tide' files (34 in total) are 4.5GB. The clipped files for all 11 regions, interestingly, are only 3.1GB
  5. then, use the clipped files with pytmd for a tidal height prediction

any steps I'm missing?

@2320sharon
Copy link
Collaborator Author

Thank you for figuring this out. I want to make sure I understand what you mean for each of these steps. For step 2&3 do you mean we have a geojson file that contains the bounds that each region with tide data that we check if the transect exists within?

@2320sharon
Copy link
Collaborator Author

There is a potential that clipping all the nc files on the fly would cause a bottle neck, if we can find a way to store them, then get only the nc files we need for the small portion of the world we are working with that would be idea. It would be a similar workflow to the global shorelines workflow (which is another feature we should mention in the TCA meeting).

The only thing is I don't know if we can store the nc files somewhere online or not

@dbuscombe-usgs
Copy link
Member

Yes the file I shared has 11 regions. What I am proposing is that, for any transect, we know which region that transect is in

Simplifying this by representing each transect by a single coordinate, let's say, the end or mid point, we need a good way to query a polygon to see if a point is inside. I believe there are ways to do this using shapely, geopandas, possibly others

@dbuscombe-usgs
Copy link
Member

The bootleneck isnt too bad for one region. It takes 1-2 mins to clip all 34 files in each region

@dbuscombe-usgs
Copy link
Member

dbuscombe-usgs commented Apr 26, 2023

I think the license also would allow for storage of a modified set of files online, but I don;t think we should go this route

@2320sharon
Copy link
Collaborator Author

That's not too bad. Quick clarification question I know we are clipping the 34 files to a single large region in the file you shared, but do we need to clip it again to the bounding box that the user draws?

@dbuscombe-usgs
Copy link
Member

but I think my preference is local storage. Clipped files are smaller in size, will load fast. The only downside is running a script for a region that takes 1-2 mins to complete

@dbuscombe-usgs
Copy link
Member

Once the set of files have been clipped for a particular region, that doesnt need to happen again. Those files can live in a local directory. A set of files for one region is tiny

@2320sharon
Copy link
Collaborator Author

I think that can work, I guess I'm just worried the users won't want to download 3.1 GB of nc files. That being said they have to with all the other pre-existing tools available and its much easier than with coastsat.

@dbuscombe-usgs
Copy link
Member

dbuscombe-usgs commented Apr 26, 2023

Users have to download two sets of 4.5GB of files, regardless. load tide and ocean tide. There is no way around this. Those files are under license with AVISO for each individual user. What we're proposing is that the user will store an ADDITIONAL set of files, totalling a few hundred MB for each region. 99% of users will only ever work in one region

@dbuscombe-usgs
Copy link
Member

And that's why I think this has to be a local workflow .... users store their own files locally

When we migrate to the cloud, the server is the individual user

@2320sharon
Copy link
Collaborator Author

Oh! Thank you for clearing that up I thought we were replacing the two sets of 4.5GB of files

@dbuscombe-usgs dbuscombe-usgs self-assigned this Apr 26, 2023
@dbuscombe-usgs
Copy link
Member

In a fresh coastseg env, a pip install dea-tools adds these:

Successfully installed GeoAlchemy2-0.13.2 OWSLib-0.29.1 affine-2.4.0 cftime-1.6.2 ciso8601-2.3.0 dask-image-2023.3.0 datacube-1.8.12 dea-tools-0.2.7 distributed-2023.4.0 geographiclib-2.0 geopy-2.3.0 greenlet-2.0.2 jupyter-ui-poll-0.2.2 lark-1.1.5 lxml-4.9.2 msgpack-1.0.5 netcdf4-1.6.3 numexpr-2.8.4 odc-algo-0.2.3 odc-ui-0.2.0a3 pims-0.6.1 psycopg2-2.9.6 rasterio-1.3.6 rasterstats-0.18.0 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 shapely-2.0.1 simplejson-3.19.1 slicerator-1.1.0 snuggs-1.4.7 sortedcontainers-2.4.0 sqlalchemy-1.4.47 tblib-1.7.0 xarray-2023.4.2 zict-3.0.0

and a pip install pytmd adds these (with a warning about scipy):

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
scikit-image 0.20.0 requires scipy<1.9.2,>=1.8; python_version <= "3.9", but you have scipy 1.10.1 which is incompatible.
Successfully installed pytmd-2.0.3 scipy-1.10.1 setuptools-scm-7.1.0

next, conda install pyfes -c fbriol results in the following changes:

The following packages will be DOWNGRADED:

  _libgcc_mutex                                    0.1-main --> 0.1-conda_forge 
  fiona                               1.8.20-py39h427c1bf_2 --> 1.8.20-py39h427c1bf_1 
  gdal                                 3.3.2-py39h218ed2d_4 --> 3.3.1-py39h218ed2d_3 
  geotiff                                  1.7.0-hcfb7246_3 --> 1.6.0-h4f31c25_6 
  libgdal                                  3.3.2-h9c9eb65_4 --> 3.3.1-h6214c1d_3 
  libspatialite                            5.0.1-h8796b1e_9 --> 5.0.1-h8694cbe_6 
  poppler                                21.09.0-ha39eefc_3 --> 21.03.0-h93df280_0 

At first glance, nothing here seems majorly wrong. This conda env is able to execute the tide clipping functions I posted above.

Ok, next I import from dea_tools.coastal import model_tides, but I get an error related to another pyproj installation

ERROR 1: PROJ: internal_proj_create_from_database: /home/marda/miniconda3/envs/coastseg/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 0 whereas a number >= 2 is expected. It comes from another PROJ installation.
---------------------------------------------------------------------------
CPLE_AppDefinedError                      Traceback (most recent call last)
File rasterio/crs.pyx:586, in rasterio.crs.CRS.from_epsg()

File rasterio/_err.pyx:195, in rasterio._err.exc_wrap_int()

CPLE_AppDefinedError: PROJ: internal_proj_create_from_database: /home/marda/miniconda3/envs/coastseg/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 0 whereas a number >= 2 is expected. It comes from another PROJ installation.

...

File rasterio/crs.pyx:590, in rasterio.crs.CRS.from_epsg()

CRSError: The EPSG code is unknown. PROJ: internal_proj_create_from_database: /home/marda/miniconda3/envs/coastseg/share/proj/proj.db contains DATABASE.LAYOUT.VERSION.MINOR = 0 whereas a number >= 2 is expected. It comes from another PROJ installation.

This may be because I'm using miniconda, not anaconda. I'm have rasterio version 1.3.6 and pyproj 3.3.1, geopandas 0.12.2, gdal 3.3.1. The error seems to come from rasterio. If I do a conda install rasterio, I get the following new packages

The following NEW packages will be INSTALLED:

  affine             pkgs/main/noarch::affine-2.3.0-pyhd3eb1b0_0 
  rasterio           conda-forge/linux-64::rasterio-1.2.8-py39hbc4e497_0 
  snuggs             pkgs/main/noarch::snuggs-1.4.7-pyhd3eb1b0_0 

but I get an error importing it ImportError: rasterio._base does not export expected C function osr_set_traditional_axis_mapping_strategy
I'm attempting to recover this but now I can't seem to install a version of rasterio that doesn't error when imported. When I nuke rasterio completely and import dea-tools, I get errors saying it can;t find Affine even though I can see that it is installed ..... vexing

@dbuscombe-usgs
Copy link
Member

Changing tactics a little to use pyTMD directly (at least for now, bypassing dea-tools)

import pyTMD

lons=[155, 160] 
lats=[-35, -36] 
time_deltas = [1,1]

tide = pyTMD.compute_tide_corrections(lons, lats, time_deltas, DIRECTORY='/home/marda/Downloads/tide_models/',
    MODEL='FES2014', EPSG=4326, EPOCH=(2000,1,1,0,0,0), TYPE='drift', TIME='GPS',
    METHOD='spline', FILL_VALUE=np.nan)

this takes a while, but it works I think. The output is

masked_array(data=[-0.30941544812285304, -0.3676034391483177],
             mask=[False, False],
       fill_value=nan)

@dbuscombe-usgs
Copy link
Member

I now have a prototype workflow for accessing pyTMD predictions at arbitrary places, using clipped .nc files 🎉

import pyTMD
import json 
import numpy as np
from shapely.geometry import Point, Polygon

Next we need two sets of vectors - the 11 regions, and some test points

test_points.zip

######### get regions from file 
regions_file = 'test_points.geojson'

with open(regions_file) as f:
    gj = json.load(f)
features = gj['features']
pt_geometries = []
for f in features:
    pt_geometries.append(f['geometry'])

print(f"{len(pt_geometries)} test points")

######### get regions from file 
regions_file = 'coastseg_tideregions_map.geojson'

with open(regions_file) as f:
    gj = json.load(f)
features = gj['features']
region_geometries = []
for f in features:
    region_geometries.append(f['geometry'])

print(f"{len(region_geometries)} tide regions")

Next I cycle through each point, find the region it is in, and get a tide estimate for that region. It is super quick!

for k in range(len(pt_geometries)):
    pt = pt_geometries[k]['coordinates']
    print(f"Evaluating {pt}")
    lon, lat = pt[0], pt[1]

    ## which region is this point in?

    # Create Point objects
    p1 = Point(lon, lat)

    for counter,r in enumerate(region_geometries):
        # Create a square
        poly = Polygon([tuple(c) for c in r['coordinates'][0]])

        if p1.within(poly):
            print(f"Point is in region {counter}")
            region = counter

            tide = pyTMD.compute_tide_corrections(lon, lat, 1, DIRECTORY=f'/home/marda/Downloads/clipped_tide_models/region{region}',
                MODEL='FES2014', EPSG=4326, EPOCH=(2000,1,1,0,0,0), TYPE='drift', TIME='GPS',
                METHOD='spline', FILL_VALUE=np.nan)
            
            print(tide)

@2320sharon
Copy link
Collaborator Author

I'm testing out this code and I thought I'd leave my recipe for setting up my environment for reference later.
here's my recipe
in anaconda prompt

  1. conda create -n tide_modeling python=3.9
  2. conda activate tide_modeling
  3. pip install pytmd
  4. pip install matplotlib
  5. conda install -c conda-forge geopandas -y ( because i got an error about gdal and this usually works)

@2320sharon
Copy link
Collaborator Author

@dbuscombe-usgs
So I'm trying out the code here, but I'm getting stuck on DIRECTORY=f'/home/marda/Downloads/clipped_tide_models/region{region}' Are your clipped models organized like /home/marda/Downloads/clipped_tide_models/region{region}/fest2014/load_tide and /home/marda/Downloads/clipped_tide_models/region{region}/fest2014/ocean_tide with the clipped files for that region in each of the load_tide and ocean_tide directories?

I now have a prototype workflow for accessing pyTMD predictions at arbitrary places, using clipped .nc files 🎉

import pyTMD
import json 
import numpy as np
from shapely.geometry import Point, Polygon

Next we need two sets of vectors - the 11 regions, and some test points

test_points.zip

######### get regions from file 
regions_file = 'test_points.geojson'

with open(regions_file) as f:
    gj = json.load(f)
features = gj['features']
pt_geometries = []
for f in features:
    pt_geometries.append(f['geometry'])

print(f"{len(pt_geometries)} test points")

######### get regions from file 
regions_file = 'coastseg_tideregions_map.geojson'

with open(regions_file) as f:
    gj = json.load(f)
features = gj['features']
region_geometries = []
for f in features:
    region_geometries.append(f['geometry'])

print(f"{len(region_geometries)} tide regions")

Next I cycle through each point, find the region it is in, and get a tide estimate for that region. It is super quick!

for k in range(len(pt_geometries)):
    pt = pt_geometries[k]['coordinates']
    print(f"Evaluating {pt}")
    lon, lat = pt[0], pt[1]

    ## which region is this point in?

    # Create Point objects
    p1 = Point(lon, lat)

    for counter,r in enumerate(region_geometries):
        # Create a square
        poly = Polygon([tuple(c) for c in r['coordinates'][0]])

        if p1.within(poly):
            print(f"Point is in region {counter}")
            region = counter

            tide = pyTMD.compute_tide_corrections(lon, lat, 1, DIRECTORY=f'/home/marda/Downloads/clipped_tide_models/region{region}',
                MODEL='FES2014', EPSG=4326, EPOCH=(2000,1,1,0,0,0), TYPE='drift', TIME='GPS',
                METHOD='spline', FILL_VALUE=np.nan)
            
            print(tide)

@2320sharon
Copy link
Collaborator Author

2320sharon commented May 3, 2023

image
Here is how mine are organized with the clipped files for each region each subdirectory.
Edit: this format is not correct.

@2320sharon
Copy link
Collaborator Author

Good news! I was able to get this code to run successfully a few times. When you ran the code did you find that some tides were bad? For example, some tides printed tide: [--] type: <class 'numpy.ma.core.MaskedArray'> while others were normal. I did look at the code and saw that the pyTMD.compute_tide_corrections function will mask out bad tides

@2320sharon
Copy link
Collaborator Author

2320sharon commented Jun 2, 2023

tide_modeling_workflow.zip

I have the scripts and files I used to test the fes2014.

The zip file contains the following:

Data Files

  1. test_points.geojson: a series of points all around the world to test the tide model with
  2. tide_regions_map.geojson: bounding boxes for each tide region around the world. The tide model is clipped to each region so that predictions are faster
  3. tides_for_points.json: a json file that is the output of the testing_matrix_for_bad_tides.py script that tried to find the tide for all the points in points_with_no_tide.json by increasing and decreasing the radius for the tide location and the time.
  4. points_with_no_tide.json: The geometry for each point that didn't have a tide as found by the isolate_points_with_no_tide.py script.

Tide Model Setup Scripts

  • If you already have the tide model setup and the clipped to the regions no need to run these files
  1. aviso_fes_tides.py: used to download the fes2014 model
  2. clip_and_write_new_nc_files_cmd_line.py: a command line script I made that clips the downloaded tide model to the tide regions specified in the tide_regions_map.geojson
  • the instructions are at the top of the script

Tide Testing Files

  1. prototype_model_tides_coastseg.py: For each point in test_points.geojson attempts to find the tide and prints it.
  2. isolate_points_with_no_tide.py: a script used to isolate only the points that didn't have a tide and save them to a file named points_with_no_tide.json
  3. testing_matrix_for_bad_tides.py: A script used to find the tide for all the points in points_with_no_tide.json by increasing and decreasing the radius for the tide location and the time.

Recipe for the Environment

conda create -n tide_modeling python=3.10 -y
conda activate tide_modeling
pip install pyTMD matplotlib
conda install -c conda-forge geopandas xarray netCDF4

@2320sharon
Copy link
Collaborator Author

2320sharon commented Jul 11, 2023

For the points that don't have a tide, we should set the extrapolate parameter to true. This will allow the model to "guess" the approx. tide level for that lat, lon value.

@2320sharon
Copy link
Collaborator Author

But even this idea, does not work for every point.

@2320sharon
Copy link
Collaborator Author

I did a bit of digging into dea_coastlines' code and few other repositories that used pyTMD and I found that they implemented the logic for the pyTMD.compute_tide_corrections a bit differently. I will outline the key difference in the model_tides function in dea_notebooks below:

  1. Instead of using pyTMD.compute_tide_corrections to run the tidal prediction for each point they build their own custom function which does this faster.
  2. To avoid the slowness of loading the same model files each time tides are predicted in the model_tides function model is loaded only once. This line is very slow because it loads the FES214 net cdf files.
    # FES2012 (leave this as an undocumented feature for now)
    if model == "FES2012":
        model = pyTMD.io.model(directory).from_file(directory / "model_FES2012.def")
    else:
        model = pyTMD.io.model(directory, format="netcdf", compressed=False).elevation(
            model
        )
  1. They only extract constants once as well
    elif model.format == "FES":
        amp, ph = pyTMD.io.FES.extract_constants(
            lon,
            lat,
            model.model_file,
            type=model.type,
            version=model.version,
            method=method,
            extrapolate=extrapolate,
            cutoff=cutoff,
            scale=model.scale,
            compressed=model.compressed,
        )
  1. They predict the tides using the drift model with the lines
    # Predict tides
    tide.data[:] = pyTMD.predict.drift(
        t, hc, c, deltat=deltat, corrections=model.format
    )
    minor = pyTMD.predict.infer_minor(t, hc, c, deltat=deltat, corrections=model.format)
    tide.data[:] += minor.data[:]

    # Replace invalid values with fill value
    tide.data[tide.mask] = tide.fill_value

I'm going to make a sample script that mimics what the model_tides function does, but only includes the models relevant to us to see if it performs better

@dbuscombe-usgs
Copy link
Member

Awesome!

@2320sharon
Copy link
Collaborator Author

2320sharon commented Jul 25, 2023

Upon a deeper inspection of the model_tides script I realized its basically just as effective as calling the pyTMD.compute_tide_corrections with all the points, which is what we have currently implemented. Its still far better than calling pyTMD.compute_tide_corrections over and over for each point, but it isn't better than what we currently have implemented in the script above. The model_tides function is basically the pyTMD.compute_tide_corrections re-written with the only change being that it gets the tide for every combination of x and y and time.

Unfortunately, this is not the breakthrough I thought it was, but on the plus side I just learned a lot about how compute tides works

@2320sharon
Copy link
Collaborator Author

2320sharon commented Jul 25, 2023

Edit: The number of unique tides generated by this script was 18911, out of 18967 tides, maybe this isn't such a great idea.

I think we should consider reducing the number of tide predictions we make. If we can group locations that are close together and at the same time then we should use the same tide prediction for them to reduce the number of tides we need to predict.

Here is part of the output from the first script :
tidal_predicitions_for_seaward_points.csv
As you can see the points (33.26381365156523,-117.4461922930749) and (33.264553936390215,-117.4467896291754) are very close together spatially and there tides are identical.

lat,lon,time,tide_height
33.26381365156523,-117.4461922930749,2015-08-23 18:44:33+00:00,0.042549690241335314 # this line is identical
33.26381365156523,-117.4461922930749,2018-07-03 18:39:45+00:00,-0.027200095752740242
33.26381365156523,-117.4461922930749,2021-04-23 18:45:09+00:00,-0.6144012290161337
33.264553936390215,-117.4467896291754,2015-05-12 18:21:35+00:00,-0.7131177901879379
33.264553936390215,-117.4467896291754,2015-07-31 18:22:10+00:00,0.3838909021356087
33.264553936390215,-117.4467896291754,2015-08-03 18:44:32+00:00,0.6221951655126738
33.264553936390215,-117.4467896291754,2015-08-23 18:44:33+00:00,0.042540340329560986  # this line is identical

@2320sharon
Copy link
Collaborator Author

Hey @robbibt!
I know you've dealt with this issue before for dea_notebooks did you find any other method besides passing in all the points and times at once to speed things up? Currently when we pass in all of our points (about 19,000) it takes about 25 minutes to get all the predictions.

@dbuscombe-usgs
Copy link
Member

(we are using a clipped version of .nc files. the region is clipped to just North America)

@dbuscombe-usgs
Copy link
Member

@2320sharon another thing to explore is filtering out the number of points. Instead of passing it the location of every transect, perhaps we should group transects, find the centroid location of the group, then compute tides for just that location. We'd then assign that tide value to every transect in the group.

As for grouping, the effective spatial resolution of the tide prediction is probably close to the value specified by CUTOFF, i.e. 10km. We have transects every 100m so there is quite a bit of interpolation/extrapolation going on... perhaps we should group all transects within, say 1km, and use the method I outlined above, and see how long that takes?

@dbuscombe-usgs
Copy link
Member

One solution "The first one would involve grouping the transects together by something (distance probably), then getting the centroid of that group and using that centroid point to compute the tides."

what's the second?

@2320sharon
Copy link
Collaborator Author

I like the idea behind this solution. We group the transects together 1km (and I'll try grouping by 5 km as well), then getting the centroid of that group and using that centroid point to compute the tides. This approach is very similar to what dea_notebooks has implemented the difference being they take the centroid of the entire dataset for a single ROI.

@2320sharon
Copy link
Collaborator Author

Sorry I misread your comment and went back to fix my response

@dbuscombe-usgs
Copy link
Member

Cool, I think this makes sense ....

@2320sharon
Copy link
Collaborator Author

By any chance do you have any recommendation on how to group the transects together?

@2320sharon
Copy link
Collaborator Author

image
Here are the seaward points for all the transects for this ROI. They are evenly spaced apart so I thought I'd break them into N groups.

@dbuscombe-usgs
Copy link
Member

i'll investigate

@2320sharon
Copy link
Collaborator Author

image
I couldn't figure out how to group them into groups by 1km so I declared the number of cluster I wanted to grouped them that way

@dbuscombe-usgs
Copy link
Member

that works!

@robbibt
Copy link

robbibt commented Jul 26, 2023

Hey @robbibt! I know you've dealt with this issue before for dea_notebooks did you find any other method besides passing in all the points and times at once to speed things up? Currently when we pass in all of our points (about 19,000) it takes about 25 minutes to get all the predictions.

Hi all, I'm a little late to this thread so apologies if you've already resolved this! This does sound like a very similar issue to what we encountered, and essentially the motivation behind having a separate model_tides func that re-implements pyTMD.compute_tide_corrections. Essentially, we found there were two main bottlenecks:

  1. Loading the tide model file constituents from NetCDF (for which a decent solution was clipping the input NetCDFs)
  2. Extracting tidal constituents at each modelling point

Number 2 was particularly problematic for our use case because using pyTMD.compute_tide_corrections in "drift" mode would extract the tidal constituents for every lat/lon + timestep pair... which was incredibly slow! For us, we usually have a small number of know tide modelling locations (e.g. a coarse 5 x 5 km grid within our study areas), then want to model tides at every single satellite observation through time (up to hundreds or thousands of timesteps at each point). So you'd end up having to extract the same constituents hundreds/thousands of times more than necessary (i.e. the extractions would be repeated for every timestep, even if the point was the same).

The solution for this was to only extract the constituents once per point, and then re-use them when modelling tides for every individual timestep. The key part is here: we essentially repeat/tile our extracted t, hc, deltat for each point by the total number of timesteps, before passing them to pyTMD.predict.drift and pyTMD.predict.infer_minor:
https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/coastal.py#L443-L449

    t, hc, deltat = (
        np.tile(timescale.tide, n_points),
        hc.repeat(n_times, axis=0),
        np.tile(deltat, n_points),
    )

Then in the final step we wrap it up nicely as a labelled Pandas dataframe to make it easier to use:
https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/coastal.py#L466-L473

This solution gives a big speed up if you have the following scenario:

  • A relatively small number of points
  • A very large number of timesteps repeated at each and every point (meaning you can re-use the same extracted constituents many times).

It sounds like you may have far more points than us though (assuming 19,000 was the points only, not points x timesteps?). In that case, the model_tides approach should still help, but might not give you the same major speedup it does for us - your idea to cluster the points spatially seems like a good way to optimise it further!

One big upcoming update to our funcs that may be handy for you too is some work I've been doing on building in parallelisation using Dask/concurrent futures - because number of points is the bottleneck, I've been experimenting with parallelising across both different tide models and chunks of n points. This looking really promising so far: run times of computing tides for 6 different tide models, a few hundred points and 30 years of hourly tides has gone down from around 8 minutes to a minute or so. Hoping to release these improvements soon - can let you know when it's ready!

@robbibt
Copy link

robbibt commented Jul 26, 2023

(Although the NetCDF reading isn't as much of an issue with clipped data, one thing I'd really like to explore is whether the current approach in pyTMD could be enhanced by lazily loading the NetCDFs using xarray, then only computing the data at the locations you need constituents for, rather than having to load the entire data into memory first. Could potentially avoid the need to clip the data at all which would simplify things for us, especially as it looks like we're increasingly shifting to a multi-tide modelling setup where we adaptively select the best model for each pixel based on our satellite data).

tide_models

@2320sharon
Copy link
Collaborator Author

Hi @robbibt

Thank you for explaining how model_tides works and clearing up a misunderstanding. I didn't understand why you were using the functions np.tile and repeat and incorrectly believed you did this only to get every combination of time and point. I see now that you did all this to avoid re-calculating the tidal constituent. Thank you so much for clearing that up.

We actually have 59 points and each point either has data for which we need to correct the tide for or has a np.NaN value. I got the 19,000 points to get the tide predictions for from performing (number of points with values x timestamps). I see now that I should be repeating the tidal constituents for every lat/lon point instead of re-calculating them for same lat/lon. I might I have to modify the logic in the code, so that it repeats the tidal constituents for the same points with different time stamps.

    t, hc, deltat = (
        np.tile(timescale.tide, n_points),
        hc.repeat(n_times, axis=0),
        np.tile(deltat, n_points),
    )

As you can see from a snippet of the dataframe below not all the lat/lon points had values we needed to predict the tide for every time stamp. Some lat/lon points only had values we needed to predict the tide for at 3 timestamps while other points had values at as many at 345 different timestamps we needed tide predictions for.
image

We've also been researching how to switch our project over to dask and xarray as well. We will have to take a look the new funcs when you're ready. They sound quite promising.

Thank you again for all your help!

@2320sharon
Copy link
Collaborator Author

I ran a comparison test against the old method we were using the method used by model tides.

For 284 timestamps with a single point the old method took: 20.55s
For 284 timestamps with a single point the model_tides took: 2.33s

For 3 timestamps with a single point the old method took: 2.45s
For 3 timestamps with a single point the model_tides took: 2.29s

Clearly not having to load the tide constituents makes a larger difference the more the timestamps you have. This is going to save us so much time. Thank you again @robbibt

@robbibt
Copy link

robbibt commented Jul 27, 2023

For 284 timestamps with a single point the old method took: 20.55s
For 284 timestamps with a single point the model_tides took: 2.33s

Order of magnitude improvement, woo! Given for your use case you don't always necessarily have exactly the same timesteps for each point, you could probably achieve something very similar by simply grouping your observations by x/y/point, then only passing the unique groups to the constituent extraction, then joining those outputs back into the original shape of the data before passing back to pyTMD.predict.drift/pyTMD.predict.infer_minor (rather than tiling/repeating by a constant n timesteps).

I did see in another thread that you had issues installing dea-tools - apologies! We'll try to make this easier in the future - for now, all our material is completely open under CC BY 4.0/Apache 2.0, so feel free to copy out any relevant funcs or content you need; some kind of reference back to the repo would be awesome. 🙂

Krause, C., Dunn, B., Bishop-Taylor, R., Adams, C., Burton, C., Alger, M., Chua, S., Phillips, C., Newey, V., Kouzoubov, K., Leith, A., Ayers, D., Hicks, A., DEA Notebooks contributors 2021. Digital Earth Australia notebooks and tools repository. Geoscience Australia, Canberra. https://doi.org/10.26186/145234

@dbuscombe-usgs
Copy link
Member

Robbi, thanks so much for your help. dea-tools (and all you do there at DEA) continues to be a huge help and inspiration!

@robbibt
Copy link

robbibt commented Jul 27, 2023

No worries - it's super cool reading through this thread, nice to see others working through a similar (incredibly niche) problem space!

(If you had the capacity down the track, I'd actually love to invite you all to give a talk to DEA/Geoscience Aus about some of the awesome stuff you're doing - we have a regular internal DEA Talks seminar series focusing on exciting science/technical work being done around the world, and this would be such a great fit!)

@dbuscombe-usgs
Copy link
Member

Robbi we'd LOVE to give a talk to DEA/Geoscience Aus! We'd also love for you to give a talk to the USGS coastal group - we are far behind, but there is a lot of energy and enthusiasm now for satellite based workflows

@2320sharon
Copy link
Collaborator Author

Thanks for all your help Robbi, we really appreciate it and we will make sure to credit the repo when we publish the code.

No worries about dea-tools we found our way around that issue.

@2320sharon
Copy link
Collaborator Author

CoastSeg v0.0.73dev3

I'm going to close this issue because we have developed a several scripts that allow the user to easily perform tidal corrections. I'll create a new issue that will integrate this new method of tide correction into coastseg.

I've got a full wiki how to automatically correct tides:
https://github.com/Doodleverse/CoastSeg/wiki/Download-Tide-Model-Guide-(Under-Construction)

3 Step Process to Automatic Tide Correction

  1. Automatically download the fes 2014 model aviso_fes_tides.py (perform only once)
  2. Clip the fes 2014 model to regions of the world clip_and_write_new_nc_files_cmd_line.py(perform only once)
  3. Automatically compute tidal predictions for a time series CSV and create a CSV file with the time series csv that's tidally corrected. apply_tidal_correction.py

@mhutchinson-woolpert
Copy link

@dbuscombe-usgs and @2320sharon , reading through this thread was useful (I had forgotten to flatten my x, y, and time arrays), looking forward to speaking again soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants