Netcdf Output #714

donaldwj · 2024-01-24T16:27:16Z

Currently the only output generated by the Ngen Framework are per catchment csv files with stream flow and output files created by the formulation libraries (which should be none). For operational usage as well as community usage netcdf output and output of variables other than stream flow is necessary.

Current behavior

Infiltration excess is given for each catchment

Expected behavior

Streamflow output at each nexus as well as additional grid or per nexus variables as defined by the user

Proposed High level design

User configurable list of output variables
Each variable creates a separate netcdf output
Catchment variables are output as a table with dimensions [1, num-catchments, num-times (unlimited?)]; a second variable is included that shows what catchment id is stored at each location in the main variable.
Nexus variable are output as table with dimensions [1, num-nexuses, num-times (unlimited?)]; a second variable is included that shows what nexus id is stored at each location in the main variable.
Gridded variable are output of table dimensions [xsize, ysize, num-times (unlimited?), each location stores output value
Mesh outputs should be supported but some thought is needed on how to stored data for loading into common mesh formats
Runtime flag for separate output file for each time step (needed for duplication of operational files)

Proposed Json for output variable description

Tag	Data Type	Allowed Values	Parent	Notes
var	String	Name of output variable	None
type	String	“catchment”,”grid”,”mesh”,”nexus”	Var
units	String	Unit description usable by UDUnits	Var
dims	Integer	2, 3	Type (grid)	Is this a two or three dimensional grid
size	Tuppel	(x1, x2)	Type (grid)	The size of grid along each grid axis
origin	Tuppel	(x1,x2), (x1,x2,x3)	Type (grid)	Start point of the grid
origin_loc	String	“BottomLeft”,”BottomRight”,”TopLeft”,”TopRight”	Type (grid)	What corner is the start point
proj	String	Valid projection string usable by ECMF	Type
wkt	String	Well know text string for a projection	Type (grid)	Replaces most (all?) other grid specific tags
major_axis	Tuppel	(x1,x2), (x1,x2,x3)	Type (grid)	The vector for moving in the first direction on the grid
minor_axis	Tuppel	(x1,x2), (x1,x2,x3)	Type (grid)	The vector for moving in the second direction on the grid
mesh_file	String	Path to an ECMF saved mesh description	Type (mesh)	ECMF mesh file

Example configurations

{
    "global": {
      "formulations": [
        {
          "name": "bmi_c++",
          "params": {
            "model_type_name": "test_bmi_cpp",
            "library_file": "./extern/test_bmi_cpp/cmake_build/libtestbmicppmodel.so",
            "init_config": "./data/bmi/c/test/test_bmi_c_config.ini",
            "main_output_variable": "OUTPUT_VAR_2",
            "variables_names_map" : {
              "INPUT_VAR_2": "TMP_2maboveground",
              "INPUT_VAR_1": "precip_rate"
            },
            "create_function": "bmi_model_create",
            "destroy_function": "bmi_model_destroy",
            "uses_forcing_file": false
          }
        }
      ],
      "forcing": {
          "file_pattern": ".*{{id}}.*.csv",
          "path": "./data/forcing/"
      }
    },
    "time": {
        "start_time": "2015-12-01 00:00:00",
        "end_time": "2015-12-30 23:00:00",
        "output_interval": 3600
    },
    "outputs" : {
      "nc_file1" : {
        "name" : "netcdf_output_1.nc",
        "type" : "NetCDF4",
        "dimensions" : {
          "X" : {
            "size" : 1000
          },
          "Y" : {
            "size" : 500
          },
          "Z" : {
            "size" : 100
          }
        },
        "variables" : {
          "inflitration_excess" : {
            "type" : "float",
            "dimensions" : "X, Y"
          },
           "soil_moisture_content" : {
            "type" : "float",
            "dimensions" : "X, Y"
          }
        }
      },
      "nc_file2" : {
        "name" : "netcdf_output_2.nc",
        "type" : "NetCDF4",
        "dimensions" : {
          "D1" : {
            "size" : 10
          },
          "D2" : {
            "size" : 100
          },
          "D3" : {
            "size" : 1000
          },
          "D4" : {
            "size" : 10000
          }
        },
        "variables" : {
          "var1" : {
            "type" : "float",
            "dimensions" : "D1"
          },
          "var2" : {
            "type" : "int",
            "dimensions" : "D2"
          },
          "var3" : {
            "type" : "char",
            "dimensions" : "D3"
          },
          "var4" : {
            "type" : "int64",
            "dimensions" : "D4"
          }      
        }
      }
    },
    "catchments": {
        "cat-27": {
          "formulations": [
            {
              "name": "bmi_c++",
              "params": {
                "model_type_name": "test_bmi_cpp",
                "library_file": "./extern/test_bmi_cpp/cmake_build/libtestbmicppmodel.so",
                "init_config": "./data/bmi/c/test/test_bmi_c_config.ini",
                "main_output_variable": "OUTPUT_VAR_2",
                "variables_names_map" : {
                  "INPUT_VAR_2": "TMP_2maboveground",
                  "INPUT_VAR_1": "precip_rate"
                },
                "create_function": "bmi_model_create",
                "destroy_function": "bmi_model_destroy",
                "uses_forcing_file": false
              }
            }
          ],
          "forcing": {
              "path": "./data/forcing/cat-27_2015-12-01 00_00_00_2015-12-30 23_00_00.csv"
          }
        },
        "cat-52": {
          "formulations": [
            {
              "name": "bmi_c++",
              "params": {
                "model_type_name": "test_bmi_cpp",
                "library_file": "./extern/test_bmi_cpp/cmake_build/libtestbmicppmodel.so",
                "init_config": "./data/bmi/c/test/test_bmi_c_config.ini",
                "main_output_variable": "OUTPUT_VAR_2",
                "variables_names_map" : {
                  "INPUT_VAR_2": "TMP_2maboveground",
                  "INPUT_VAR_1": "precip_rate"
                },
                "create_function": "bmi_model_create",
                "destroy_function": "bmi_model_destroy",
                "uses_forcing_file": false
              }
            }
          ],
          "forcing": {
              "path": "./data/forcing/cat-52_2015-12-01 00_00_00_2015-12-30 23_00_00.csv"
          }
        },
        "cat-67": {
          "formulations": [
            {
              "name": "bmi_c++",
              "params": {
                "model_type_name": "test_bmi_cpp",
                "library_file": "./extern/test_bmi_cpp/cmake_build/libtestbmicppmodel.so",
                "init_config": "./data/bmi/c/test/test_bmi_c_config.ini",
                "main_output_variable": "OUTPUT_VAR_2",
                "variables_names_map" : {
                  "INPUT_VAR_2": "TMP_2maboveground",
                  "INPUT_VAR_1": "precip_rate"
                },
                "create_function": "bmi_model_create",
                "destroy_function": "bmi_model_destroy",
                "uses_forcing_file": false
              }
            }
          ],
          "forcing": {
              "path": "./data/forcing/cat-67_2015-12-01 00_00_00_2015-12-30 23_00_00.csv"
          }
        }
    }
}

New Status

Information on a NetCDF file to be created can now be part of a config file.
Layers should be linked to a NetCDF output file one per layer

Open Question

For any given layer, how do we determine which variables from contained models are output? There are several possibilities.

List all variables defined in any given model, this will leads to NaNs in output data when not all models have the same output variables.
Setup output variables lists for particular layers, (Layer 0 etc), and output only these expected variables
Allow netcdf file creation to include a layer with the the file definition.
Others?

We are looking for a solution that is will work both for operations and research usage.

Defaults variables sizes from layer type

Catchment layers will create variables with dimensions [time, number-of-catchments ]
Nexus layers will create variables with dimensions [time, number-of-nexuses ]
Domain layers will create variables with dimensions [time, bmi-grid-x-size, bmi-grid-y-size]

The text was updated successfully, but these errors were encountered:

program-- · 2024-01-24T16:39:05Z

For context/reference:

Using a custom writer class for Nexuses: Implement Nexus Writer for t-route formatted QLAT outputs #612 (comment)
mdframe-based implementation for general use: https://github.com/NOAA-OWP/ngen/blob/master/src/utilities/mdframe/handler_netcdf.cpp

They may/may not be helpful in this case

donaldwj · 2024-01-24T16:43:36Z

Certainly worth looking at. Ideally we don't want netcdf creation code in more locations than necessary.

PhilMiller · 2024-01-25T00:22:46Z

Wasn't the intention with the mdframe work that the rest of the framework code would assign its results into that, and the output would only have to be configured and implemented relative to mdframe?

PhilMiller · 2024-01-25T00:23:25Z

per catchment svg files

CSV files?

PhilMiller · 2024-01-25T00:27:10Z

Is there a particular format of NetCDF files (variable names, dimension ordering, metadata conventions, etc) that the forcings engine or other common code around NWS or NOAA or the broader community expects? If so, the NetCDF files we generate should ideally conform to that

PhilMiller · 2024-01-25T00:27:31Z

@jduckerOWP

jduckerOWP · 2024-01-25T13:21:08Z

NextGen_sample_forcing_output.tar.gz

I've dropped a tar file that contains a sample output of the NextGen hydrofabric forcing file that essentially has the standard formatting from previous NWM forcings. I would advise to stick to at least the netcdf metadata formatting highlighted here for the forcing variables. Gridded data would have an "x", "y", and "crs" variables projecting the geospatial coordinates and its coordinate reference system of the gridded output instead of "catchment ids" like what is in the attached file. Mesh forcing data on the other want forcing data output along both, the elements and nodes of a mesh pending on the way a given mesh model handles forcing data. I think we should have a further discussion at least from what is expected by the mdframe to output within mesh domains along the coastal or great lake regions for a given NextGen user.

donaldwj · 2024-01-25T15:03:59Z

Phil the current operational files do have a particular format and will need the ability to recreate them, they however do not follow general practice making the current files difficult to use.

donaldwj · 2024-01-25T17:13:54Z

Gridded data can contain the projected coordinates although that is somewhat wasteful of space particularly when doing one file per timestep. I would like to improve hydrofabric output files, we can keep all existing data but the spatial location of outputs from the hydrofabric should be included in addition to id of the location. The current stream output files are very hard to use.

program-- · 2024-01-25T18:06:35Z

Gridded data can contain the projected coordinates although that is somewhat wasteful of space particularly when doing one file per timestep.

I would base your decisions off https://gdal.org/drivers/raster/netcdf.html#georeference and the supporting documentation there. Additionally, if space is a concern, maybe a reduced horizontal grid would help since it's (generally) better compressible: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#reduced-horizontal-grid

alternatively: not sure if it's widely-usable, but grid metadata can build a grid as well: origin, extent, and resolution.

I would like to improve hydrofabric output files, we can keep all existing data but the spatial location of outputs from the hydrofabric should be included in addition to id of the location. The current stream output files are very hard to use.

On the other hand, I'd probably stay away from trying to include vector geometry (other than points) in netCDF... it's annoyingly complicated http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#geometries (has to be CF-1.8 for GDAL-compat, and only that or WKT is supported, but not WKB, which would've been easier 🤦🏾). From a data-science perspective, joining the outputs to the hydrofabric is trivial, since it's just a LEFT JOIN, so I wouldn't worry too much about usability in that aspect. Also, adding geometries is going to blow up the size of the outputs...

PhilMiller · 2024-02-11T17:05:48Z

Your thoughts on implementation choices in #729 please

hellkite500 · 2024-03-22T17:32:03Z

Adding configuration mechanisms for establishing netcdf output.

donaldwj self-assigned this Jan 24, 2024

hellkite500 mentioned this issue Jan 26, 2024

Enable NetCDF catchment outputs #704

Closed

robertbartel linked a pull request Feb 23, 2024 that will close this issue

Netcdf output #744

Draft

11 tasks

robertbartel mentioned this issue Jun 5, 2024

Investigate IO performance issue with workers and object-store-backed datasets NOAA-OWP/DMOD#637

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Netcdf Output #714

Netcdf Output #714

donaldwj commented Jan 24, 2024 •

edited

Loading

program-- commented Jan 24, 2024

donaldwj commented Jan 24, 2024 •

edited

Loading

PhilMiller commented Jan 25, 2024

PhilMiller commented Jan 25, 2024

PhilMiller commented Jan 25, 2024

PhilMiller commented Jan 25, 2024

jduckerOWP commented Jan 25, 2024 •

edited

Loading

donaldwj commented Jan 25, 2024

donaldwj commented Jan 25, 2024

program-- commented Jan 25, 2024 •

edited

Loading

PhilMiller commented Feb 11, 2024

hellkite500 commented Mar 22, 2024

Netcdf Output #714

Netcdf Output #714

Comments

donaldwj commented Jan 24, 2024 • edited Loading

Current behavior

Expected behavior

Proposed High level design

Proposed Json for output variable description

Example configurations

New Status

Open Question

Defaults variables sizes from layer type

program-- commented Jan 24, 2024

donaldwj commented Jan 24, 2024 • edited Loading

PhilMiller commented Jan 25, 2024

PhilMiller commented Jan 25, 2024

PhilMiller commented Jan 25, 2024

PhilMiller commented Jan 25, 2024

jduckerOWP commented Jan 25, 2024 • edited Loading

donaldwj commented Jan 25, 2024

donaldwj commented Jan 25, 2024

program-- commented Jan 25, 2024 • edited Loading

PhilMiller commented Feb 11, 2024

hellkite500 commented Mar 22, 2024

donaldwj commented Jan 24, 2024 •

edited

Loading

donaldwj commented Jan 24, 2024 •

edited

Loading

jduckerOWP commented Jan 25, 2024 •

edited

Loading

program-- commented Jan 25, 2024 •

edited

Loading