Description:
The formulation-selector tool, aka Regionalization and Formulation Testing & Selection (RaFTS), is under development. For more information, see the Wiki.
As NOAA OWP builds the model-agnostic NextGen framework, the hydrologic modeling community will need to know how to optimally select model formulations and estimate parameter values across ungauged catchments. This problem becomes intractable when considering the unique combinations of current and future model formulations combined with the innumerable possible parameter combinations across the continent. To simplify the model selection problem, we apply an analytical tool that predicts hydrologic formulation performance (Bolotin et al., 2022, Liu et al., 2022) using community-generated data. The regionalization and formulation testing and selection (RaFTS) tool readily predicts how models might perform across catchments based on catchment attributes. This decision support tool is designed such that as the hydrologic modeling community generates more results, better decisions can be made on where formulations would be best suited.
Technology stack:
- Python: The features of the formulation-selector that ingest model results and catchment attributes to predict model performances based on catchment attributes is written in Python.
- R: The features of the formulation-selector that acquire catchment attributes that feed into the model prediction algorithm (which, as noted above, is in Python) are written in R to promote compatibility with the NOAA-OWP/hydrofabric.
Status: Preliminary development. CHANGELOG.
- Technology stack: python. The formulation-selection decision support tool is intended to be a standalone analysis, though integration with pre-existing formulation evaluation metrics tools will eventually occur.
- Status: Preliminary development. CHANGELOG.
- Links to production or demo instances
- Describe what sets this apart from related-projects. Linking to another doc or page is OK if this can't be expressed in a sentence or two.
Screenshot: If the software has visual components, place a screenshot after the description; e.g., N/A
Thus far, formulation-selector
has been developed in and tested with Python versions 3.11 and 3.12, so these are currently the recommended versions.
You may consider creating a new virtual environment for employing formulation-selector
with the following packages:
- pynhd
- dask
- joblib
- netcdf4
- numpy
- pandas
- pyyaml
- scikit_learn
- setuptools
- xarray
- NOAA-OWP/hydrofabric
- Note that the arrow package needs
arrow::arrow_with_s3() == TRUE
. IfFALSE
, consider downloading arrow via apache's r-universe - Steps to install hydrofabric: Refer to wiki
- Note that the arrow package needs
- USGS nhdplusTools
- pynhd
- Install
fs_proc
packagepip install /path/to/pkg/fs_proc/fs_proc/.
- Build a yaml config file
/sripts/eval_metrics/name_of_dataset_here/name_of_dataset_schema.yaml
refer to this template - Create a script that reads in the data and runs the standardization processing. Example script here
- Then run the following:
cd /path/to/scripts/eval_metrics/name_of_dataset_here/
python proc_name_of_dataset_here_metrics.py "name_of dataset_here_schema.yaml"
> cd /path/to/pkg/fs_proc/fs_proc
> pip install .
Ingesting raw data describing model metrics (e.g. KGE, NSE) from modeling simulations requires two tasks:
- Creating a custom configuration schema as a .yaml file
- Modify a dataset ingest script
We track these tasks inside formulation-selector/scripts/eval_ingest/_name_of_raw_dataset_here_/
The data schema yaml file contains the following fields:
col_schema
: required column mappings in the evaluation metrics dataset. These describe the column names in the raw data and how they'll map to standardized column names.- for
metric_mappings
refer to the the fs_categories.yaml
- for
file_io
: The location of the input data and desired save location. Also specifies the save file format.formulation_metadata
: Descriptive traits of the model formulation that generated the metrics. Some of these are required fields while others are optional.references
: Optional but very helplful metadata describing where the data came from.
The script that converts the raw data into the desired format. This performs the following tasks:
- Read in the data schema yaml file (standardized)
- Ingest the raw data (standardized)
- Modify the raw data to become wide-format where columns consist of the gage id and separate columns for each formulation evaluation metric (user-developed munging)
- Call the
fs_proc.proc_col_schema()
to standardize the dataset into a common format (standardized function call)
If the software is configurable, describe it in detail, either here or in other documentation to which you link.
cd /path/to/scripts/eval_metrics/name_of_dataset_here/
python proc_name_of_dataset_here_metrics.py "name_of dataset_here_schema.yaml"
You may also run unit tests on fs_proc
:
> cd /path/to/formulation-selection/pkg/fs_proc/fs_proc/tests
> python -m unittest test_proc_eval_metrics.py
To assess code coverage:
python -m coverage run -m unittest
python -m coverage report
Document any known significant shortcomings with the software.
If you have questions, concerns, bug reports, etc, please file an issue in this repository's Issue Tracker.
This section should detail why people should get involved and describe key areas you are currently focusing on; e.g., trying to get feedback on features, fixing certain bugs, building important pieces, etc.
General instructions on how to contribute should be stated with a link to CONTRIBUTING.
Description:
Attributes from non-standardized datasets may need to be acquired for RaFTS modeling and prediction. The R package proc.attr.hydfab
performs the attribute grabbing.
Run flow.install.proc.attr.hydfab.R
to install the package. Note that a user may need to modify the section that creates the fs_dir
for their custom path to this repo's directory.
The following is an example script that runs the attribute grabber: fs_attrs_grab
.
This script grabs attribute data corresponding to locations of interest, and saves those attribute data inside a directory as multiple parquet files. The proc.attr.hydfab::retrieve_attr_exst()
function may then efficiently query and then retrieve desired data by variable name and comid from those parquet files.
Note that this script was designed to process data that have already been generated by the fs_proc
python package, but users may want to grab attributes from additional locations that have not been processed by fs_proc
(e.g. attributes from ungaged basins to use for prediction).
- To independently process attributes for locations without running
fs_proc
python package beforehand: A user may ignore previously-processed data by settingRetr_Params$datasets
asNULL
and specifying the path to a file containing gage_ids inside theRetr_Params$loc_id_read
list. - In the context of reading in a processed dataset from
fs_proc
or reading in a separate file specifying locations of interest, theRetr_Params$datasets
uses directory names insideinput/user_data_std/
(or simplyall
to process all datasets). The additional file with location ids may also be read in, or ignored entirely. In summary, the either-or or both approaches are options, and is defined by how theRetr_Params
parameter list object is populated. - The 'independent' data file for processing attributes has been tested for .csv and .parquet file formats. Other formats compatible with
arrow::open_dataset()
should be possible but have not been tested.
Bolotin, LA, Haces-Garcia F, Liao, M, Liu, Q, Frame, J, Ogden FL (2022). Data-driven Model Selection in the Next Generation Water Resources Modeling Framework. In Deardorff, E., A. Modaresi Rad, et al. (2022). National Water Center Innovators Program - Summer Institute, CUAHSI Technical Report, HydroShare, http://www.hydroshare.org/resource/096e7badabb44c9f8c29751098f83afa
Liu, Q, Bolotin, L, Haces-Garcia, F, Liao, M, Ogden, FL, Frame JM (2022) Automated Decision Support for Model Selection in the Nextgen National Water Model. Abstract (H45I-1503) presented at 2022 AGU Fall Meeting 12-16 Dec.