Skip to content

Commit 68d9ef8

Browse files
senesisKlaus ZimmermannsloosvelJavier Vegas-RegidorBenjamin Müller
authored
Handle IPSL-CM6 (#1153)
* Add basic support for variable mappings * Add first era5 mapping * Find files for CMIP6 DCPP startdates (#771) * First attempte * Do not require start and end years, add them later * Correct condition * Avoid key error in fx variables * Consider two possible paths * Fix function name * Fix variable name * Avoid duplicates in filename * Add test for startdate expansion * Add test for the replace tags method * Rename tag * Add documentation * Allow to load subexps per timerange or as a whole * Fix condition * Remove 'all_years' functionality * Fix conditions * Fix flake * Remove whitespace Co-authored-by: Javier Vegas-Regidor <[email protected]> * Skip regridding if the target grid is almost identical to the source grid (#507) Co-authored-by: Bouwe Andela <[email protected]> Co-authored-by: Stef Smeets <[email protected]> * Fixes for sos and siconc of BCC models (#1090) * sos and siconc fixed * tests added * test fixed * fix flake8 * fix flake8 * fix codacy issue * Pin cf-units and fix tests (cf-units>=2.1.5) (#1140) * pin cf-units * pin cf-units * fix test * fix test * Handle IPSL-CM6 (the feature won't actually work without #1124) * class Huss inherits from cass Tas. Also : Fix codacy diags. * Replace os.system() by subprocess.run() * Fix flake8 diags * var_mapping -> extra_facets * Rename _config/variable_details to _config/extra_facets * Fix doc re. lack of 'output_file as a dict', and choice of native6 * Fix codacy diags in ipsl_cm6.py * Use project IPSLCM to handle IPSL-CM6 * Implement changes according to Bouwe's review, 2021/06/07 (except unit tests) * Add unit tests for _fixes/ipslcm/ipsl_cm6.py * delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py * Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mappings.yml * Restore main versions for _recipe.py and cmor_fixes/fix.py * Restore main version for _recipe.py * Delete extraneous era5-mappings.yml * Avoid using mapping_key when calling fix.get_cube_from_list() * Empty change in fix.py for forcing codacy to re-scan it * Polish doc * Polish doc again * Again... * and again ... * Fix typo in comment * Fixes according to @zklaus review * Reduce formatting changes * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/develop/fixing_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/quickstart/find_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Update doc/quickstart/find_data.rst Co-authored-by: Klaus Zimmermann <[email protected]> * Minor formatting improvements * Organize mapping file in each realm in two sections (CMIP6 and IPSL) Co-authored-by: Klaus Zimmermann <[email protected]> Co-authored-by: sloosvel <[email protected]> Co-authored-by: Javier Vegas-Regidor <[email protected]> Co-authored-by: Benjamin Müller <[email protected]> Co-authored-by: Bouwe Andela <[email protected]> Co-authored-by: Stef Smeets <[email protected]> Co-authored-by: Rémi Kazeroni <[email protected]> Co-authored-by: Valeriu Predoi <[email protected]>
1 parent 6c63c1c commit 68d9ef8

File tree

13 files changed

+598
-55
lines changed

13 files changed

+598
-55
lines changed

doc/develop/fixing_data.rst

Lines changed: 62 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,38 @@
11
.. _fixing_data:
22

33
***********
4-
Dataset fix
4+
Fixing data
55
***********
66

7-
Some (model) datasets contain (known) errors that would normally prevent them
8-
from being processed correctly by the ESMValCore. The errors can be in
7+
The baseline case for ESMValCore input data is CMOR fully compliant
8+
data that is read using Iris' :func:`iris:iris.load_raw`.
9+
ESMValCore also allows for some departures from compliance (see
10+
:ref:`cmor_check_strictness`). Beyond that situation, some datasets
11+
(either model or observations) contain (known) errors that would
12+
normally prevent them from being processed. The issues can be in
913
the metadata describing the dataset and/or in the actual data.
1014
Typical examples of such errors are missing or wrong attributes (e.g.
1115
attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or
1216
mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
1317
coordinate bounds like ''lat_bnds'') or problems with the actual data
1418
(e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request).
1519

16-
The ESMValCore can apply on the fly fixes to datasets that have
17-
known errors that can be fixed automatically.
20+
As an extreme case, some data sources simply are not NetCDF
21+
files and must go through some other data load function.
22+
23+
The ESMValCore can apply on the fly fixes to such datasets when
24+
issues can be fixed automatically. This is implemented for a set
25+
of `Natively supported non-CMIP datasets`_. The following provides
26+
details on how to design such fixes.
1827

1928
.. note::
20-
**CMORization as a fix**.
21-
Support for many observational and reanalysis datasets is implemented through
22-
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`.
23-
However, it is also possible to add support for a dataset that is not part of
24-
a CMIP data request by implementing fixes for it.
25-
This is particularly useful for large datasets, where keeping a copy of both
26-
the original and CMORized dataset is not feasible.
27-
See `Natively supported non-CMIP datasets`_ for a list of currently supported
28-
datasets.
2929

30+
**CMORizer scripts**. Support for many observational and reanalysis
31+
datasets is also possible through a priori reformatting by
32+
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`,
33+
which are rather relevant for datasets of small volume
34+
35+
.. _fix_structure:
3036

3137
Fix structure
3238
=============
@@ -326,7 +332,11 @@ strictness to the highest:
326332
Natively supported non-CMIP datasets
327333
====================================
328334
329-
Fixed datasets are supported through the ``native6`` project.
335+
Some fixed datasets and native models formats are supported through
336+
the ``native6`` project or through a dedicated project.
337+
338+
Observational Datasets
339+
----------------------
330340
Put the files containing the data in the directory that you have configured
331341
for the ``native6`` project in your :ref:`user configuration file`, in a
332342
subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
@@ -335,13 +345,13 @@ definition in the :ref:`recipe <recipe_overview>`.
335345
Below is a list of datasets currently supported.
336346
337347
ERA5
338-
----
348+
~~~~
339349
340350
- Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
341351
- Tier: 3
342352
343353
MSWEP
344-
-----
354+
~~~~~
345355
346356
- Supported variables: ``pr``
347357
- Supported frequencies: ``mon``, ``day``, ``3hr``.
@@ -354,6 +364,39 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio
354364
355365
For more info: http://www.gloh2o.org/
356366
367+
.. _fixing_native_models:
368+
369+
Native models
370+
-------------
371+
372+
The following models are natively supported through the procedure described
373+
above (:ref:`fix_structure`) and at :ref:`configure_native_models`:
374+
375+
IPSL-CM6
376+
~~~~~~~~
377+
378+
Both output formats (i.e. the ``Output`` and the ``Analyse / Time series``
379+
formats) are supported, and should be configured in recipes as e.g.:
380+
381+
.. code-block:: yaml
382+
383+
datasets:
384+
- {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO,
385+
account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM,
386+
root: /thredds/tgcc/store}
387+
- {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO,
388+
account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM,
389+
root: /thredds/tgcc/store}
390+
391+
.. _ipslcm_extra_facets_example:
392+
393+
The ``Output`` format is an example of a case where variables are grouped in
394+
multi-variable files, which name cannot be computed directly from datasets
395+
attributes alone but requires to use an extra_facets file, which principles are
396+
explained in :ref:`extra_facets`, and which content is :download:`available here
397+
</../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`. These multi-variable
398+
files must also undergo some data selection.
399+
357400
.. _extra-facets-fixes:
358401
359402
Use of extra facets in fixes
@@ -370,4 +413,5 @@ variable to the rest of the processing chain.
370413
371414
Normally, the applicable standard for variables is CMIP6.
372415
373-
For more details, refer to existing uses of this feature as examples.
416+
For more details, refer to existing uses of this feature as examples,
417+
as e.g. :ref:`for IPSL-CM6<ipslcm_extra_facets_example>`.

doc/develop/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@ features.
1010
:maxdepth: 1
1111

1212
Preprocessor function <preprocessor_function>
13-
Dataset fix <fixing_data>
13+
Fixing data <fixing_data>
1414
Deriving a variable <derivation>

doc/quickstart/configure.rst

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,9 @@ It will be installed along with ESMValCore and can also be viewed on GitHub:
177177
<https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/config-developer.yml>`_.
178178
This configuration file describes the file system structure and CMOR tables for several
179179
key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ,
180-
ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid
180+
ETHZ, SMHI, BSC), and for native output data for some
181+
models (IPSL, ... see :ref:`configure_native_models`).
182+
CMIP data is stored as part of the Earth System Grid
181183
Federation (ESGF) and the standards for file naming and paths to files are set
182184
out by CMOR and DRS. For a detailed description of these standards and their
183185
adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we
@@ -289,6 +291,48 @@ related to CMOR table settings available:
289291
to get the name of the file containing the ``mip`` table.
290292
Defaults to the value provided in ``cmor_type``.
291293

294+
.. _configure_native_models:
295+
296+
Configuring native models and observation data sets
297+
----------------------------------------------------
298+
299+
ESMValCore can be configured for handling native model output formats
300+
and specific
301+
observation data sets without preliminary reformatting. You can choose
302+
to host this new data source either under a dedicated project or under
303+
project ``native6``; when choosing the latter, such a configuration
304+
involves the following steps:
305+
306+
- allowing for ESMValTool to locate the data files:
307+
308+
- entry ``native6`` of ``config-developer.yml`` should be
309+
complemented with sub-entries for ``input_dir`` and ``input_file``
310+
that goes under a new key representing the
311+
data organization (such as ``MY_DATA_ORG``), and these sub-entries can
312+
use an arbitrary list of ``{placeholders}``. Example :
313+
314+
.. code-block:: yaml
315+
316+
native6:
317+
...
318+
input_dir:
319+
default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
320+
MY_DATA_ORG: '{model}/{exp}/{simulation}/{version}/{type}'
321+
input_file:
322+
default: '*.nc'
323+
MY_DATA_ORG: '{simulation}_*.nc'
324+
...
325+
326+
- if necessary, provide a so-called ``extra facets file`` which
327+
allows to cope e.g. with variable naming issues for finding
328+
files. See :ref:`extra_facets` and :download:`this example of
329+
such a file for IPSL-CM6
330+
<../../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`.
331+
332+
- ensuring that ESMValCore get the right metadata and data out of
333+
your data files: this is described in :ref:`fixing_data`
334+
335+
292336
.. _config-ref:
293337

294338
References configuration file

doc/quickstart/find_data.rst

Lines changed: 43 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _findingdata:
22

33
************
4-
Finding data
4+
Input data
55
************
66

77
Overview
@@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and
1515
the input the user needs to specify, giving examples on how to use the data
1616
finding routine under different scenarios.
1717

18+
Data types
19+
==========
20+
1821
.. _CMOR-DRS:
1922

20-
CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF
21-
=========================================================
23+
CMIP data
24+
---------
2225
CMIP data is widely available via the Earth System Grid Federation
2326
(`ESGF <https://esgf.llnl.gov/>`_) and is accessible to users either
2427
via download from the ESGF portal or through the ESGF data nodes hosted
@@ -45,6 +48,40 @@ From the ESMValTool user perspective the number of data input parameters is
4548
optimized to allow for ease of use. We detail this procedure in the next
4649
section.
4750

51+
Native model data
52+
-----------------
53+
Support for native model data that is not formatted according to a CMIP
54+
data request is quite easy using basic
55+
:ref:`ESMValCore fix procedure <fixing_data>` and has been implemented
56+
for some models :ref:`as described here <fixing_native_models>`
57+
58+
Observational data
59+
------------------
60+
Part of observational data is retrieved in the same manner as CMIP data, for example
61+
using the ``OBS`` root path set to:
62+
63+
.. code-block:: yaml
64+
65+
OBS: /gws/nopw/j04/esmeval/obsdata-v2
66+
67+
and the dataset:
68+
69+
.. code-block:: yaml
70+
71+
- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
72+
73+
in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
74+
CMOR-DRS_ are used again and the file will be automatically found:
75+
76+
.. code-block::
77+
78+
/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
79+
80+
Since observational data are organized in Tiers depending on their level of
81+
public availability, the ``default`` directory must be structured accordingly
82+
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
83+
``drs: default``.
84+
4885
.. _data-retrieval:
4986

5087
Data retrieval
@@ -186,8 +223,8 @@ datasets are listed in any recipe, under either the ``datasets`` and/or
186223
.. code-block:: yaml
187224
188225
datasets:
189-
- {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
190-
- {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}
226+
- {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
227+
- {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}
191228
192229
``_data_finder`` will use this information to find data for **all** the variables specified in ``diagnostics/variables``.
193230

@@ -208,7 +245,7 @@ and the dataset you need is specified in your ``recipe.yml`` as:
208245

209246
.. code-block:: yaml
210247
211-
- {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}
248+
- {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}
212249
213250
for a variable, e.g.:
214251

@@ -244,32 +281,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file:
244281
245282
.. _observations:
246283

247-
Observational data
248-
==================
249-
Observational data is retrieved in the same manner as CMIP data, for example
250-
using the ``OBS`` root path set to:
251-
252-
.. code-block:: yaml
253-
254-
OBS: /gws/nopw/j04/esmeval/obsdata-v2
255-
256-
and the dataset:
257-
258-
.. code-block:: yaml
259-
260-
- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
261-
262-
in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
263-
CMOR-DRS_ are used again and the file will be automatically found:
264-
265-
.. code-block::
266-
267-
/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
268-
269-
Since observational data are organized in Tiers depending on their level of
270-
public availability, the ``default`` directory must be structured accordingly
271-
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
272-
``drs: default``.
273284

274285
Data loading
275286
============

doc/quickstart/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Getting started
66

77
Installation <install>
88
Configuration <configure>
9-
Finding data <find_data>
9+
Input data <find_data>
1010
Installed recipes <recipes>
1111
Running <run>
1212
Output <output>

0 commit comments

Comments
 (0)