diff --git a/doc/develop/fixing_data.rst b/doc/develop/fixing_data.rst index 3008863a34..cf1b35e66c 100644 --- a/doc/develop/fixing_data.rst +++ b/doc/develop/fixing_data.rst @@ -1,11 +1,15 @@ .. _fixing_data: *********** -Dataset fix +Fixing data *********** -Some (model) datasets contain (known) errors that would normally prevent them -from being processed correctly by the ESMValCore. The errors can be in +The baseline case for ESMValCore input data is CMOR fully compliant +data that is read using Iris' :func:`iris:iris.load_raw`. +ESMValCore also allows for some departures from compliance (see +:ref:`cmor_check_strictness`). Beyond that situation, some datasets +(either model or observations) contain (known) errors that would +normally prevent them from being processed. The issues can be in the metadata describing the dataset and/or in the actual data. Typical examples of such errors are missing or wrong attributes (e.g. attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or @@ -13,20 +17,22 @@ mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing coordinate bounds like ''lat_bnds'') or problems with the actual data (e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request). -The ESMValCore can apply on the fly fixes to datasets that have -known errors that can be fixed automatically. +As an extreme case, some data sources simply are not NetCDF +files and must go through some other data load function. + +The ESMValCore can apply on the fly fixes to such datasets when +issues can be fixed automatically. This is implemented for a set +of `Natively supported non-CMIP datasets`_. The following provides +details on how to design such fixes. .. note:: - **CMORization as a fix**. - Support for many observational and reanalysis datasets is implemented through - :ref:`CMORizer scripts in the ESMValTool `. - However, it is also possible to add support for a dataset that is not part of - a CMIP data request by implementing fixes for it. - This is particularly useful for large datasets, where keeping a copy of both - the original and CMORized dataset is not feasible. - See `Natively supported non-CMIP datasets`_ for a list of currently supported - datasets. + **CMORizer scripts**. Support for many observational and reanalysis + datasets is also possible through a priori reformatting by + :ref:`CMORizer scripts in the ESMValTool `, + which are rather relevant for datasets of small volume + +.. _fix_structure: Fix structure ============= @@ -326,7 +332,11 @@ strictness to the highest: Natively supported non-CMIP datasets ==================================== -Fixed datasets are supported through the ``native6`` project. +Some fixed datasets and native models formats are supported through +the ``native6`` project or through a dedicated project. + +Observational Datasets +---------------------- Put the files containing the data in the directory that you have configured for the ``native6`` project in your :ref:`user configuration file`, in a subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``. @@ -335,13 +345,13 @@ definition in the :ref:`recipe `. Below is a list of datasets currently supported. ERA5 ----- +~~~~ - Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``) - Tier: 3 MSWEP ------ +~~~~~ - Supported variables: ``pr`` - Supported frequencies: ``mon``, ``day``, ``3hr``. @@ -354,6 +364,39 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio For more info: http://www.gloh2o.org/ +.. _fixing_native_models: + +Native models +------------- + +The following models are natively supported through the procedure described +above (:ref:`fix_structure`) and at :ref:`configure_native_models`: + +IPSL-CM6 +~~~~~~~~ + +Both output formats (i.e. the ``Output`` and the ``Analyse / Time series`` +formats) are supported, and should be configured in recipes as e.g.: + +.. code-block:: yaml + + datasets: + - {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO, + account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM, + root: /thredds/tgcc/store} + - {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO, + account: p86caub, status: PROD, dataset: IPSL-CM6, project: IPSLCM, + root: /thredds/tgcc/store} + +.. _ipslcm_extra_facets_example: + +The ``Output`` format is an example of a case where variables are grouped in +multi-variable files, which name cannot be computed directly from datasets +attributes alone but requires to use an extra_facets file, which principles are +explained in :ref:`extra_facets`, and which content is :download:`available here +`. These multi-variable +files must also undergo some data selection. + .. _extra-facets-fixes: Use of extra facets in fixes @@ -370,4 +413,5 @@ variable to the rest of the processing chain. Normally, the applicable standard for variables is CMIP6. -For more details, refer to existing uses of this feature as examples. +For more details, refer to existing uses of this feature as examples, +as e.g. :ref:`for IPSL-CM6`. diff --git a/doc/develop/index.rst b/doc/develop/index.rst index e10a5143f0..5d192448a8 100644 --- a/doc/develop/index.rst +++ b/doc/develop/index.rst @@ -10,5 +10,5 @@ features. :maxdepth: 1 Preprocessor function - Dataset fix + Fixing data Deriving a variable diff --git a/doc/quickstart/configure.rst b/doc/quickstart/configure.rst index accc7f87f4..98b4a7dcd6 100644 --- a/doc/quickstart/configure.rst +++ b/doc/quickstart/configure.rst @@ -177,7 +177,9 @@ It will be installed along with ESMValCore and can also be viewed on GitHub: `_. This configuration file describes the file system structure and CMOR tables for several key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ, -ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid +ETHZ, SMHI, BSC), and for native output data for some +models (IPSL, ... see :ref:`configure_native_models`). +CMIP data is stored as part of the Earth System Grid Federation (ESGF) and the standards for file naming and paths to files are set out by CMOR and DRS. For a detailed description of these standards and their adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we @@ -289,6 +291,48 @@ related to CMOR table settings available: to get the name of the file containing the ``mip`` table. Defaults to the value provided in ``cmor_type``. +.. _configure_native_models: + +Configuring native models and observation data sets +---------------------------------------------------- + +ESMValCore can be configured for handling native model output formats +and specific +observation data sets without preliminary reformatting. You can choose +to host this new data source either under a dedicated project or under +project ``native6``; when choosing the latter, such a configuration +involves the following steps: + + - allowing for ESMValTool to locate the data files: + + - entry ``native6`` of ``config-developer.yml`` should be + complemented with sub-entries for ``input_dir`` and ``input_file`` + that goes under a new key representing the + data organization (such as ``MY_DATA_ORG``), and these sub-entries can + use an arbitrary list of ``{placeholders}``. Example : + + .. code-block:: yaml + + native6: + ... + input_dir: + default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}' + MY_DATA_ORG: '{model}/{exp}/{simulation}/{version}/{type}' + input_file: + default: '*.nc' + MY_DATA_ORG: '{simulation}_*.nc' + ... + + - if necessary, provide a so-called ``extra facets file`` which + allows to cope e.g. with variable naming issues for finding + files. See :ref:`extra_facets` and :download:`this example of + such a file for IPSL-CM6 + <../../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`. + + - ensuring that ESMValCore get the right metadata and data out of + your data files: this is described in :ref:`fixing_data` + + .. _config-ref: References configuration file diff --git a/doc/quickstart/find_data.rst b/doc/quickstart/find_data.rst index 05905c04a1..96e0c995e0 100644 --- a/doc/quickstart/find_data.rst +++ b/doc/quickstart/find_data.rst @@ -1,7 +1,7 @@ .. _findingdata: ************ -Finding data +Input data ************ Overview @@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and the input the user needs to specify, giving examples on how to use the data finding routine under different scenarios. +Data types +========== + .. _CMOR-DRS: -CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF -========================================================= +CMIP data +--------- CMIP data is widely available via the Earth System Grid Federation (`ESGF `_) and is accessible to users either via download from the ESGF portal or through the ESGF data nodes hosted @@ -45,6 +48,40 @@ From the ESMValTool user perspective the number of data input parameters is optimized to allow for ease of use. We detail this procedure in the next section. +Native model data +----------------- +Support for native model data that is not formatted according to a CMIP +data request is quite easy using basic +:ref:`ESMValCore fix procedure ` and has been implemented +for some models :ref:`as described here ` + +Observational data +------------------ +Part of observational data is retrieved in the same manner as CMIP data, for example +using the ``OBS`` root path set to: + + .. code-block:: yaml + + OBS: /gws/nopw/j04/esmeval/obsdata-v2 + +and the dataset: + + .. code-block:: yaml + + - {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3} + +in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in +CMOR-DRS_ are used again and the file will be automatically found: + +.. code-block:: + + /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc + +Since observational data are organized in Tiers depending on their level of +public availability, the ``default`` directory must be structured accordingly +with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when +``drs: default``. + .. _data-retrieval: Data retrieval @@ -186,8 +223,8 @@ datasets are listed in any recipe, under either the ``datasets`` and/or .. code-block:: yaml datasets: - - {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004} - - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014} + - {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004} + - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014} ``_data_finder`` will use this information to find data for **all** the variables specified in ``diagnostics/variables``. @@ -208,7 +245,7 @@ and the dataset you need is specified in your ``recipe.yml`` as: .. code-block:: yaml - - {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014} + - {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014} for a variable, e.g.: @@ -244,32 +281,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file: .. _observations: -Observational data -================== -Observational data is retrieved in the same manner as CMIP data, for example -using the ``OBS`` root path set to: - - .. code-block:: yaml - - OBS: /gws/nopw/j04/esmeval/obsdata-v2 - -and the dataset: - - .. code-block:: yaml - - - {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3} - -in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in -CMOR-DRS_ are used again and the file will be automatically found: - -.. code-block:: - - /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc - -Since observational data are organized in Tiers depending on their level of -public availability, the ``default`` directory must be structured accordingly -with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when -``drs: default``. Data loading ============ diff --git a/doc/quickstart/index.rst b/doc/quickstart/index.rst index 4f9887f76f..2fb022bfda 100644 --- a/doc/quickstart/index.rst +++ b/doc/quickstart/index.rst @@ -6,7 +6,7 @@ Getting started Installation Configuration - Finding data + Input data Installed recipes Running Output diff --git a/esmvalcore/_config/extra_facets/ipslcm-mappings.yml b/esmvalcore/_config/extra_facets/ipslcm-mappings.yml new file mode 100644 index 0000000000..eb99044a6a --- /dev/null +++ b/esmvalcore/_config/extra_facets/ipslcm-mappings.yml @@ -0,0 +1,264 @@ +# Mapping, for ISPLCM output formats 'Analyse' and 'Output', between a +# CMOR variable name and the labels to use by ESMValTool to find the +# corresponding file, and the corresponding variable in the file +# +# For format 'Analyse', the config-development.yml file tells +# ESMValTool to use key 'ipsl_varname' for building the filename, +# while for format 'Output' it specifies to use key 'group' +# +# Specifying 'igcm_dir' here allows to avoid having to specifiy it in +# datasets definitions +# +# Key 'use_cdo' allows to choose whether CDO will be invoked for +# selecting a variable in a multi-variable file. This generally allows +# for smaller overal load time. But because CDO has a licence which is +# not compliant with ESMValtool licence policy, the default +# configuration is to avoid using it. You may use customized settings +# by installing a modified version of this file as +# ~/.esmvatlool/variable_details/ipslcm-*.yml +# see : https://docs.esmvaltool.org/projects/esmvalcore/en/latest/quickstart/configure.html#location-of-the-extra-facets-files) +# +# Key 'positive' allows to tell ESMValTool when the sign convention +# for a variable is reversed between IPSL-CM6 and CMIP6. In that case, +# just e.g. put 'positive: down' if the CMIP6 convention is 'positive: up' +# +# The main key below, 'IPSL-CM6', is the one to use as a value for +# attribute 'dataset' in the 'datasets' entry of recipes; it matches +# the module name 'ipsl_cm6.py' in 'cmor_fixes/ipsl*/ipsl_cm6.py' +# +--- +# A series of shortcuts for repetitive settings +ShortCuts: + General: &gene {model: IPSLCM6, use_cdo: false} + ATM3DVARS: &atm3dvars {group: histmthNMC, dir: ATM, <<: *gene} + Atmvars: &atmvars {group: histmth, dir: ATM, <<: *gene} + SrfVars: &srfvars {group: sechiba_history, dir: SRF, <<: *gene} + StoVars: &stovars {group: stomate_history, dir: SBG, <<: *gene} + StiVars: &stivars {group: stomate_ipcc_history, dir: SBG, <<: *gene} + SechVars: &sechvars {group: sechiba_history, dir: SBG, <<: *gene} + OceTVars: &ocetvars {group: grid_T, dir: OCE, <<: *gene} + OceUVars: &oceuvars {group: grid_U, dir: OCE, <<: *gene} + OceVVars: &ocevvars {group: grid_V, dir: OCE, <<: *gene} + OceDvars: &ocedvars {group: diaptr, dir: OCE, <<: *gene} + OcePtr: &oceptr {group: ptrc_T, dir: BGC, <<: *gene} + IceVars: &icevars {group: icemod, dir: ICE, <<: *gene} + + +IPSL-CM6: + # ================================================= + Amon: + # ==================================================== + # ATM variables processed using their CMIP6 short_name + # ==================================================== + # ATM 3D Variables + ta: {ipsl_varname: ta, <<: *atm3dvars} + ua: {ipsl_varname: ua, <<: *atm3dvars} + va: {ipsl_varname: va, <<: *atm3dvars} + zg: {ipsl_varname: zg, <<: *atm3dvars} + hus: {ipsl_varname: hus, <<: *atm3dvars} + hur: {ipsl_varname: hur, <<: *atm3dvars} + + # ATM general variables + pr: {ipsl_varname: precip, <<: *atmvars} + psl: {ipsl_varname: slp, <<: *atmvars} + tas: {ipsl_varname: t2m, <<: *atmvars} + ts: {ipsl_varname: tsol, <<: *atmvars} + huss: {ipsl_varname: q2m, <<: *atmvars} + uas: {ipsl_varname: u10m, <<: *atmvars} + vas: {ipsl_varname: v10m, <<: *atmvars} + sfcWind: {ipsl_varname: wind10m, <<: *atmvars} + hurs: {ipsl_varname: rh2m, <<: *atmvars} + prw: {ipsl_varname: prw, <<: *atmvars} + t2m: {ipsl_varname: t2m, <<: *atmvars} + q2m: {ipsl_varname: q2m, <<: *atmvars} + u10m: {ipsl_varname: u10m, <<: *atmvars} + v10m: {ipsl_varname: v10m, <<: *atmvars} + wind10m: {ipsl_varname: wind10m, <<: *atmvars} + + # -> Turbulent fluxes + hfls: {ipsl_varname: flat, <<: *atmvars, positive: down} + hfss: {ipsl_varname: sens, <<: *atmvars, positive: down} + tauu: {ipsl_varname: taux, <<: *atmvars} + tauv: {ipsl_varname: tauy, <<: *atmvars} + + # -> Clouds + clt: {ipsl_varname: cldt, <<: *atmvars} + + # -> Radiative up at TOA + rlut: {ipsl_varname: topl, <<: *atmvars} + rsut: {ipsl_varname: SWupTOA, <<: *atmvars} + rlutcs: {ipsl_varname: topl0, <<: *atmvars} + rsutcs: {ipsl_varname: SWupTOAclr, <<: *atmvars} + + # -> Radiative down at TOA + rsdt: {ipsl_varname: SWdnTOA, <<: *atmvars} + + # -> Radiative up at Surface + rlus: {ipsl_varname: LWupSFC, <<: *atmvars} + rsus: {ipsl_varname: SWupSFC, <<: *atmvars} + rsuscs: {ipsl_varname: SWupSFcclr, <<: *atmvars} + rluscs: {ipsl_varname: LWupSFcclr, <<: *atmvars} + + # -> Radiative down at Surface + rlds: {ipsl_varname: LWdnSFC, <<: *atmvars} + rsds: {ipsl_varname: SWdnSFC, <<: *atmvars} + rldscs: {ipsl_varname: LWdnSFcclr, <<: *atmvars} + rsdscs: {ipsl_varname: SWdnSFcclr, <<: *atmvars} + + # ======================================================= + # ATM variables processed using their own IPSL short_name + # ======================================================= + # -> general variables + precip: {ipsl_varname: precip, <<: *atmvars} + slp: {ipsl_varname: slp, <<: *atmvars} + + # -> Turbulent fluxes + taux: {ipsl_varname: taux, <<: *atmvars} + tauy: {ipsl_varname: tauy, <<: *atmvars} + + # -> Radiative down at TOA + SWdnTOA: {ipsl_varname: SWdnTOA, <<: *atmvars} + + # -> Radiative up at TOA + topl: {ipsl_varname: topl, <<: *atmvars} + SWupTOA: {ipsl_varname: SWupTOA, <<: *atmvars} + topl0: {ipsl_varname: topl0, <<: *atmvars} + SWupTOAclr: {ipsl_varname: SWupTOAclr, <<: *atmvars} + + # -> Radiative up at Surface + LWupSFC: {ipsl_varname: LWupSFC, <<: *atmvars} + SWupSFC: {ipsl_varname: SWupSFC, <<: *atmvars} + SWupSFcclr: {ipsl_varname: SWupSFcclr, <<: *atmvars} + LWupSFcclr: {ipsl_varname: LWupSFcclr, <<: *atmvars} + + # -> Radiative down at Surface + LWdnSFC: {ipsl_varname: LWdnSFC, <<: *atmvars} + SWdnSFC: {ipsl_varname: SWdnSFC, <<: *atmvars} + LWdnSFcclr: {ipsl_varname: LWdnSFcclr, <<: *atmvars} + SWdnSFcclr: {ipsl_varname: SWdnSFcclr, <<: *atmvars} + + + # ================================================= + Lmon: + # =============================================== + # SRF -- Land surface - ORCHIDEE + # ==================================================== + # variables processed using their CMIP6 short_name + # ==================================================== + + mrrob: {ipsl_varname: drainage, <<: *srfvars} + runoff: {ipsl_varname: runoff, <<: *srfvars} + mrros: {ipsl_varname: runoff, <<: *srfvars} + lai: {ipsl_varname: lai, <<: *stivars} + + # ======================================================= + # variables processed using their own IPSL short_name + # ======================================================= + drainage: {ipsl_varname: drainage, <<: *srfvars} + snow: {ipsl_varname: snow, <<: *srfvars} + snw_land: {ipsl_varname: snow, <<: *srfvars} + fluxlat: {ipsl_varname: fluxlat, <<: *srfvars} + fluxsens: {ipsl_varname: fluxsens, <<: *srfvars} + albnir: {ipsl_varname: alb_nir, <<: *srfvars} + albvis: {ipsl_varname: alb_vis, <<: *srfvars} + tair: {ipsl_varname: tair, <<: *srfvars} + swdown: {ipsl_varname: swdown, <<: *srfvars} + lwdown: {ipsl_varname: lwdown, <<: *srfvars} + transpir: {ipsl_varname: transpir, <<: *srfvars} + evapnu: {ipsl_varname: evapnu, <<: *srfvars} + es: {ipsl_varname: evapnu, <<: *srfvars} + inter: {ipsl_varname: inter, <<: *srfvars} + subli: {ipsl_varname: subli, <<: *srfvars} + evap: {ipsl_varname: evap, <<: *srfvars} + Qs: {ipsl_varname: Qs, <<: *srfvars} + frac_snow: {ipsl_varname: frac_snow, <<: *srfvars} + maint_resp: {ipsl_varname: maint_resp, <<: *srfvars} + growth_resp: {ipsl_varname: growth_resp, <<: *srfvars} + hetero_resp: {ipsl_varname: hetero_resp, <<: *srfvars} + maintresp: {ipsl_varname: maint_resp, <<: *srfvars} + growthresp: {ipsl_varname: growth_resp, <<: *srfvars} + heteroresp: {ipsl_varname: hetero_resp, <<: *srfvars} + nee: {ipsl_varname: nee, <<: *srfvars} + + # SBG + total_soil_carb: {ipsl_varname: TOTAL_SOIL_CARB, <<: *stovars} + totalsoilcarb: {ipsl_varname: TOTAL_SOIL_CARB, <<: *stovars} + maxvegetfrac: {ipsl_varname: maxvegetfrac, <<: *sechvars} + vegetfrac: {ipsl_varname: vegetfrac, <<: *sechvars} + cfracgpp: {ipsl_varname: gpp, <<: *stivars} + + # -> alias for the obs + gpptot: {ipsl_varname: gpp, <<: *stivars} + Contfrac: {ipsl_varname: Contfrac, <<: *sechvars} + + # ================================================= + Omon: + # =============================================== + # OCE + # ==================================================== + # variables processed using their CMIP6 short_name + # ==================================================== + tos: {ipsl_varname: tos, <<: *ocetvars} + sos: {ipsl_varname: sos, <<: *ocetvars} + thetao: {ipsl_varname: thetao, <<: *ocetvars} + so: {ipsl_varname: so, <<: *ocetvars} + zos: {ipsl_varname: zos, <<: *ocetvars} + mlotst: {ipsl_varname: mldr10_1, <<: *ocetvars} + wfo: {ipsl_varname: wfo, <<: *ocetvars} + + # -- Wind stress curl + tauuo: {ipsl_varname: tauuo, <<: *oceuvars} + tauvo: {ipsl_varname: tauvo, <<: *oceuvars} + + # ======================================================= + # variables processed using their own IPSL short_name + # ======================================================= + mlddt02: {ipsl_varname: mld_dt02, <<: *ocetvars} + + # ---------------------------------------------- # + # Aliases to the zonal average (computed on the x axis of the ORCA grid) + zotemglo: {ipsl_varname: zotemglo, <<: *ocedvars} + zotempac: {ipsl_varname: zotempac, <<: *ocedvars} + zotematl: {ipsl_varname: zotematl, <<: *ocedvars} + zotemind: {ipsl_varname: zotemind, <<: *ocedvars} + zosalglo: {ipsl_varname: zosalglo, <<: *ocedvars} + zosalpac: {ipsl_varname: zosalpac, <<: *ocedvars} + zosalatl: {ipsl_varname: zosalatl, <<: *ocedvars} + zosalind: {ipsl_varname: zosalind, <<: *ocedvars} + zomsfglo: {ipsl_varname: zomsfglo, <<: *ocedvars} + zomsfpac: {ipsl_varname: zomsfpac, <<: *ocedvars} + zomsfatl: {ipsl_varname: zomsfatl, <<: *ocedvars} + zomsfind: {ipsl_varname: zomsfind, <<: *ocedvars} + + # --------------------------------------------------- # + # Aliases to the old IGCM_OUT names + sosstsst: {ipsl_varname: sosstsst, <<: *ocetvars} + sosaline: {ipsl_varname: sosaline, <<: *ocetvars} + votemper: {ipsl_varname: votemper, <<: *ocetvars} + vosaline: {ipsl_varname: vosaline, <<: *ocetvars} + mldr10_3: {ipsl_varname: mldr10_3, <<: *ocetvars} + somx3010: {ipsl_varname: somx3010, <<: *ocetvars} + mld_dt02: {ipsl_varname: mld_dt02, <<: *ocetvars} + + # BGC -> Biogeochemistry + NO3: {ipsl_varname: NO3, <<: *oceptr} + PO4: {ipsl_varname: PO4, <<: *oceptr} + Si: {ipsl_varname: Si, <<: *oceptr} + O2: {ipsl_varname: O2, <<: *oceptr} + + # ================================================= + SImon: + # =============================================== + # ICE + # ==================================================== + # variables processed using their CMIP6 short_name + # ==================================================== + sivolu: {ipsl_varname: sivolu, <<: *icevars} + siconc: {ipsl_varname: siconc, <<: *icevars} + sithick: {ipsl_varname: sithic, <<: *icevars} + + # ======================================================= + # variables processed using their own IPSL short_name + # ======================================================= + sic: {ipsl_varname: siconc, <<: *icevars} + sit: {ipsl_varname: sithic, <<: *icevars} diff --git a/esmvalcore/cmor/_fixes/fix.py b/esmvalcore/cmor/_fixes/fix.py index 3e1e9a8a00..be94c4a774 100644 --- a/esmvalcore/cmor/_fixes/fix.py +++ b/esmvalcore/cmor/_fixes/fix.py @@ -1,7 +1,7 @@ """Contains the base class for dataset fixes.""" import importlib -import inspect import os +import inspect from ..table import CMOR_TABLES diff --git a/esmvalcore/cmor/_fixes/ipslcm/__init__.py b/esmvalcore/cmor/_fixes/ipslcm/__init__.py new file mode 100644 index 0000000000..d783f0a9b8 --- /dev/null +++ b/esmvalcore/cmor/_fixes/ipslcm/__init__.py @@ -0,0 +1 @@ +"""Fixes for IPSLCM data.""" diff --git a/esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py b/esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py new file mode 100644 index 0000000000..dd978b33af --- /dev/null +++ b/esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py @@ -0,0 +1,104 @@ +"""Fixes for IPSLCM6 TS output format.""" +import logging +import subprocess +import time + +from ..fix import Fix +from ..shared import add_scalar_height_coord + +logger = logging.getLogger(__name__) + +# The key used in extra_facets file for providing the +# variable name (in NetCDF file) that match the CMOR variable name +VARNAME_KEY = "ipsl_varname" + + +class AllVars(Fix): + """Fixes for all IPSLCM variables.""" + + def fix_file(self, filepath, output_dir): + """Select IPSLCM variable in filepath. + + This is done only if input file is a multi-variable one. This + is diagnosed by searching in the input filepathame for the + extra_facet value for key 'group'. + + In such cases, it is worth to use an external tool for + filtering, at least until Iris loads fast (which is not the case + up to, and including, V3.0.2), and CDO can be used, depending on + extra_facets key `use_cdo` + + However, we take care of ESMValTool policy re. dependencies licence + + """ + if "_" + self.extra_facets.get("group", + "non-sense") + ".nc" not in filepath: + # No need to filter the file + logger.debug("Not filtering for %s", filepath) + return filepath + + if not self.extra_facets.get("use_cdo", False): + # The configuration developer doesn't provide CDO, while ESMValTool + # licence policy doesn't allow to include it in dependencies + # Or he considers that plain Iris load is quick enough for + # that file + logger.debug("In ipsl-cm6.py : CDO not activated for %s", filepath) + return filepath + + # Proceed with CDO selvar + varname = self.extra_facets.get(VARNAME_KEY, self.vardef.short_name) + alt_filepath = filepath.replace(".nc", "_cdo_selected.nc") + outfile = self.get_fixed_filepath(output_dir, alt_filepath) + tim1 = time.time() + logger.debug("Using CDO for selecting %s in %s", varname, filepath) + command = ["cdo", "-selvar,%s" % varname, filepath, outfile] + subprocess.run(command, check=True) + logger.debug("CDO selection done in %.2f seconds", time.time() - tim1) + return outfile + + def fix_metadata(self, cubes): + """Fix metadata for any IPSLCM variable + filter out other variables. + + Fix the name of the time coordinate, which is called time_counter + in the original file. + + Remove standard_name 'time' in auxiliary time coordinates + """ + logger.debug("Fixing metadata for ipslcm_cm6") + + varname = self.extra_facets.get(VARNAME_KEY, self.vardef.short_name) + cube = self.get_cube_from_list(cubes, varname) + cube.var_name = self.vardef.short_name + + # Need to degrade auxiliary time coordinates, because some + # Iris function does not support to have more than one + # coordinate with standard_name='time' + for coordinate in cube.coords(dim_coords=False): + if coordinate.standard_name == 'time': + coordinate.standard_name = '' + + # Fix variable name for time_counter + for coordinate in cube.coords(dim_coords=True): + if coordinate.var_name == 'time_counter': + coordinate.var_name = 'time' + + positive = self.extra_facets.get("positive") + if positive: + cube.attributes["positive"] = positive + + return [cube] + + +class Tas(Fix): + """Fixes for ISPLCM 2m temperature.""" + + def fix_metadata(self, cubes): + """Add height2m.""" + varname = self.extra_facets.get(VARNAME_KEY) + cube = self.get_cube_from_list(cubes, varname) + add_scalar_height_coord(cube) + return cubes + + +class Huss(Tas): + """Fixes for ISPLCM 2m specific humidity.""" diff --git a/esmvalcore/config-developer.yml b/esmvalcore/config-developer.yml index 19b00482da..b483de3a1f 100644 --- a/esmvalcore/config-developer.yml +++ b/esmvalcore/config-developer.yml @@ -114,6 +114,7 @@ CMIP3: default: '/' BADC: '{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{short_name}/{ensemble}/{latestversion}' DKRZ: '{exp}/{modeling_realm}/{frequency}/{short_name}/{dataset}/{ensemble}' + IPSL: '{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{ensemble}/{short_name}/{version}/{short_name}' input_file: '{short_name}_*.nc' output_file: '{project}_{institute}_{dataset}_{mip}_{exp}_{ensemble}_{short_name}_{start_year}-{end_year}' cmor_type: 'CMIP3' @@ -205,6 +206,7 @@ obs4mips: input_dir: default: 'Tier{tier}/{dataset}' RCAST: '/' + IPSL: '{realm}/{short_name}/{freq}/{grid}/{institute}/{dataset}/{latest_version}' input_file: '{short_name}_{dataset}_{level}_{version}_*.nc' output_file: '{project}_{dataset}_{level}_{version}_{short_name}' cmor_type: 'CMIP6' @@ -237,3 +239,16 @@ CORDEX: output_file: '{short_name}_{dataset}_{exp}_{ensemble}_{rcm_version}_{mip}' cmor_type: 'CMIP5' cmor_path: 'cordex' + +IPSLCM: + cmor_strict: false + input_dir: + default: '{root}/{account}/{model}/{status}/{exp}/{simulation}/{dir}/{freq}' + input_file: + default: + - '{simulation}_*_{ipsl_varname}.nc' + - '{simulation}_*_{group}.nc' + output_file: '{dataset}_{account}_{model}_{status}_{exp}_{simulation}_{freq}_{short_name}' + cmor_type: 'CMIP6' + cmor_default_table_prefix: 'CMIP6_' + diff --git a/esmvalcore/config-user.yml b/esmvalcore/config-user.yml index 257b358b06..f408cc42e0 100644 --- a/esmvalcore/config-user.yml +++ b/esmvalcore/config-user.yml @@ -106,3 +106,26 @@ profile_diagnostic: false # CMIP6: ETHZ # CMIP5: ETHZ # CMIP3: ETHZ + +# Site-specific entries: IPSL +# Uncomment the lines below to locate data on Ciclad at IPSL +#rootpath: +# IPSLCM: / +# CMIP5: /bdd/CMIP5/output +# CMIP6: /bdd/CMIP6 +# CMIP3: /bdd/CMIP3 +# CORDEX: /bdd/CORDEX/output +# obs4mips: /bdd/obs4MIPS/obs-CFMIP/observations +# ana4mips: /not_yet +# OBS: /not_yet +# OBS6: /not_yet +# RAWOBS: /not_yet +#drs: +# CMIP6: DKRZ +# CMIP5: DKRZ +# CMIP3: IPSL +# CORDEX: BADC +# obs4mips: IPSL +# ana4mips: default +# OBS: not_yet +# OBS6: not_yet diff --git a/esmvalcore/preprocessor/_io.py b/esmvalcore/preprocessor/_io.py index 938e4b6f2e..fbe9eee7a1 100644 --- a/esmvalcore/preprocessor/_io.py +++ b/esmvalcore/preprocessor/_io.py @@ -117,8 +117,8 @@ def load(file, callback=None): category=UserWarning, module='iris', ) - raw_cubes = iris.load_raw(file, callback=callback) + logger.debug("Done with loading %s", file) if not raw_cubes: raise Exception('Can not load cubes from {0}'.format(file)) for cube in raw_cubes: diff --git a/tests/integration/cmor/_fixes/ipslcm/test_ipsl_cm6.py b/tests/integration/cmor/_fixes/ipslcm/test_ipsl_cm6.py new file mode 100644 index 0000000000..f136114bcf --- /dev/null +++ b/tests/integration/cmor/_fixes/ipslcm/test_ipsl_cm6.py @@ -0,0 +1,37 @@ +"""Tests for the fixes of IPSL-CM6.""" +import iris +import pytest + +from esmvalcore.cmor._fixes.ipslcm.ipsl_cm6 import Tas +from esmvalcore.cmor.fix import Fix +from esmvalcore.cmor.table import get_var_info + + +@pytest.fixture +def test_get_tas_fix(): + """Test getting of fix.""" + fix = Fix.get_fixes('IPSLCM', 'IPSL-CM6', 'Amon', 'tas') + assert fix == [Tas(None)] + + +@pytest.fixture +def cubes(): + """``tas`` cube.""" + + cube = iris.cube.Cube( + [200.0], # chilly, isn't it ? + var_name='tas', + standard_name='air_temperature', + units='K', + ) + return iris.cube.CubeList([cube]) + + +def test_tas_fix_metadata(cubes): + """Test ``fix_metadata`` for ``tas``.""" + vardef = get_var_info('CMIP6', 'Amon', 'tas') + fix = Tas(vardef) + out_cubes = fix.fix_metadata(cubes) + out_cube = fix.get_cube_from_list(out_cubes, 'tas') + assert any([coord.standard_name == 'height' + for coord in out_cube.aux_coords])