Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NWM netcdf re-formatting script #27

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jameshalgren
Copy link
Contributor

@jameshalgren jameshalgren commented Jan 4, 2023

WARNING: does not work in initial state
contains compatibility issues to be fixed
in subsequent commits.

A version of this script is used to operationally transform the NWM native outputs to this post-processed form.

Essentially, the post-processing combines all timesteps from a given forecast cycle into a single file and writes the data so that values for a given location are contiguous in the dataset.

Running the simple test in this repository (use the wget scripts to download data and then run python test_create_timeseries.py, shows that it takes about 3 seconds to convert a single forecast cycle's worth of 18 short_range outputs and about 50 seconds for a single set of medium_range outputs. Note that linear scaling would give slightly less time, so we probably have some tuning to do.

running short_range files
3.2564339637756348
running medium_range files
50.05790591239929

Also, the function has been tweaked to allow larger chunks and more overall memory (10 Gb) for the medium_range test.

TODO:

  • Use name generating utility from this same repository to replace the long list of text files in the test script commit.
  • Take advantage of logging capability for test script
  • fix data_variables bug changing main array.
  • Pedantically check the output to show that it is identical after transformation.
  • explore performance for querying with different chunk sizes.

James Halgren added 3 commits January 4, 2023 11:32
WARNING: does not work in initial state
contains compatibility issues to be fixed
in subsequent commits.
@jameshalgren jameshalgren changed the title commit initial transfer script Add NWM netcdf re-formatting script Jan 5, 2023
@jameshalgren
Copy link
Contributor Author

jameshalgren commented Jan 5, 2023

@AndersNilssonNoaa -- the last commit works around unexpected behavior if the function is called twice -- the data_variables input was being modified inside the routine, so repeating the call without specifically re-initializing data_variables would produce an error. It's not probably affecting anything on your side (because it's best to create a completely clean set of input variables anyway).

@CoreyKrewson-NOAA, @karnesh @arpita0911patel

@jameshalgren
Copy link
Contributor Author

ping @karnesh

@jameshalgren
Copy link
Contributor Author

ping @Castronova @igarousi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant