Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review - Preprocessors #93

Open
jwagemann opened this issue Aug 6, 2019 · 1 comment
Open

Review - Preprocessors #93

jwagemann opened this issue Aug 6, 2019 · 1 comment
Labels

Comments

@jwagemann
Copy link
Contributor

  • What resolution and extent did you use for the Unified Data Format?

  • The preprocessors use a reference grid, have you considered using epsg codes instead?

  • What remapping method is used?

  • CDS longitude range is [0, 360], while many other data providers use the range [-180, +180]. Is the preprocessor automatically rotating the layers, if needed?

  • How do you define how to do spatial aggregations for different variables? For instance for temperature you might want to use the mean, for precipitation you might want to use the sum (if you are converting to a coarser resolution). We can only see a MeanAggregator.

@tommylees112
Copy link
Contributor

Thanks for your questions!

The first thing to say is that all of these parameters are flexible and the pipeline allows you to specify each of them as you require. We have made some initial choices for our current experiments but these are only one set of parameter choices with the pipeline.

What resolution and extent did you use for the Unified Data Format?

We used an extent for kenya defined as this bounding box:

    Region(name='kenya', lonmin=33.501, lonmax=42.283,
                  latmin=-5.202, latmax=6.002)

The resolution we are currently using is ~5km but we are in the process of changing this for our own experiments.

The preprocessors use a reference grid, have you considered using epsg codes instead?

The reference grid is a previous .nc file that has the lat/lon resolution that the user is interested in mapping all other data to. We have not considered using epsg codes but would be interested to look at this if you have a python implemention of remapping netcdf files using epsg codes. Just to clarify we are not transforming data from different projections, but we are putting all data onto the same resolution.

What remapping method is used?

This can be flexibly specified by the user from the following (see here for explanations):

{'bilinear', 'conservative', 'nearest_s2d', 'nearest_d2s', 'patch'}

We used nearest_s2d

CDS longitude range is [0, 360], while many other data providers use the range [-180, +180]. Is the preprocessor automatically rotating the layers, if needed?
Yes it is automatic

How do you define how to do spatial aggregations for different variables? For instance for temperature you might want to use the mean, for precipitation you might want to use the sum (if you are converting to a coarser resolution). We can only see a MeanAggregator.

we are using the mean as a first implementation of the pipeline. It would be a quick fix to change this if required. However, it is worth noting that since all values are normalized to have mean 0 and std 1 before they are input to the machine learning models, whether the data is aggregated with a sum or mean doesn’t make a difference to what the models see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants