ESMValGroup
diff --git a/‎doc/develop/fixing_data.rst
Lines changed: 62 additions & 18 deletions b/‎doc/develop/fixing_data.rst
Lines changed: 62 additions & 18 deletions
diff --git a/‎doc/develop/index.rst
Lines changed: 1 addition & 1 deletion b/‎doc/develop/index.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/quickstart/configure.rst
Lines changed: 45 additions & 1 deletion b/‎doc/quickstart/configure.rst
Lines changed: 45 additions & 1 deletion
diff --git a/‎doc/quickstart/find_data.rst
Lines changed: 43 additions & 32 deletions b/‎doc/quickstart/find_data.rst
Lines changed: 43 additions & 32 deletions
diff --git a/‎doc/quickstart/index.rst
Lines changed: 1 addition & 1 deletion b/‎doc/quickstart/index.rst
Lines changed: 1 addition & 1 deletion
@@ -1,32 +1,38 @@
 .. _fixing_data:
 
 ***********
-Dataset fix
+Fixing data
 ***********
 
-Some (model) datasets contain (known) errors that would normally prevent them
-from being processed correctly by the ESMValCore. The errors can be in
+The baseline case for ESMValCore input data is CMOR fully compliant
+data that is read using Iris' :func:`iris:iris.load_raw`.
+ESMValCore also allows for some departures from compliance (see
+:ref:`cmor_check_strictness`). Beyond that situation, some datasets
+(either model or observations) contain (known) errors that would
+normally prevent them from being processed. The issues can be in
 the metadata describing the dataset and/or in the actual data.
 Typical examples of such errors are missing or wrong attributes (e.g.
 attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or
 mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
 coordinate bounds like ''lat_bnds'') or problems with the actual data
 (e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request).
 
-The ESMValCore can apply on the fly fixes to datasets that have
-known errors that can be fixed automatically.
+As an extreme case, some data sources simply are not NetCDF
+files and must go through some other data load function.
+
+The ESMValCore can apply on the fly fixes to such datasets when
+issues can be fixed automatically. This is implemented for a set
+of `Natively supported non-CMIP datasets`_. The following provides
+details on how to design such fixes.
 
 .. note::
-  **CMORization as a fix**.
-  Support for many observational and reanalysis datasets is implemented through
-  :ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`.
-  However, it is also possible to add support for a dataset that is not part of
-  a CMIP data request by implementing fixes for it.
-  This is particularly useful for large datasets, where keeping a copy of both
-  the original and CMORized dataset is not feasible.
-  See `Natively supported non-CMIP datasets`_ for a list of currently supported
-  datasets.
 
+  **CMORizer scripts**. Support for many observational and reanalysis
+  datasets is also possible through a priori reformatting by
+  :ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`,
+  which are rather relevant for datasets of small volume
+
+.. _fix_structure:
 
 Fix structure
 =============
@@ -326,7 +332,11 @@ strictness to the highest:
 Natively supported non-CMIP datasets
 ====================================
 
-Fixed datasets are supported through the ``native6`` project.
+Some fixed datasets and native models formats are supported through
+the ``native6`` project or through a dedicated project.
+
+Observational Datasets
+----------------------
 Put the files containing the data in the directory that you have configured
 for the ``native6`` project in your :ref:`user configuration file`, in a
 subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
@@ -335,13 +345,13 @@ definition in the :ref:`recipe <recipe_overview>`.
 Below is a list of datasets currently supported.
 
 ERA5
-----
+~~~~
 
 - Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
 - Tier: 3
 
 MSWEP
------
+~~~~~
 
 - Supported variables: ``pr``
 - Supported frequencies: ``mon``, ``day``, ``3hr``.
@@ -354,6 +364,39 @@ For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversio
 
 For more info: http://www.gloh2o.org/
 
+.. _fixing_native_models:
+
+Native models
+-------------
+
+The following models are natively supported through the procedure described
+above (:ref:`fix_structure`) and at :ref:`configure_native_models`:
+
+IPSL-CM6
+~~~~~~~~
+
+Both output formats (i.e. the ``Output`` and the ``Analyse / Time series``
+formats) are supported, and should be configured in recipes as e.g.:
+
+.. code-block:: yaml
+
+  datasets:
+    - {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO,
+       account: p86caub,  status: PROD, dataset: IPSL-CM6, project: IPSLCM,
+       root: /thredds/tgcc/store}
+    - {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO,
+       account: p86caub,  status: PROD, dataset: IPSL-CM6, project: IPSLCM,
+       root: /thredds/tgcc/store}
+
+.. _ipslcm_extra_facets_example:
+
+The ``Output`` format is an example of a case where variables are grouped in
+multi-variable files, which name cannot be computed directly from datasets
+attributes alone but requires to use an extra_facets file, which principles are
+explained in :ref:`extra_facets`, and which content is :download:`available here
+</../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`. These multi-variable
+files must also undergo some data selection.
+
 .. _extra-facets-fixes:
 
 Use of extra facets in fixes
@@ -370,4 +413,5 @@ variable to the rest of the processing chain.
 
 Normally, the applicable standard for variables is CMIP6.
 
-For more details, refer to existing uses of this feature as examples.
+For more details, refer to existing uses of this feature as examples,
+as e.g. :ref:`for IPSL-CM6<ipslcm_extra_facets_example>`.
@@ -10,5 +10,5 @@ features.
    :maxdepth: 1
 
     Preprocessor function <preprocessor_function>
-    Dataset fix <fixing_data>
+    Fixing data <fixing_data>
     Deriving a variable <derivation>
@@ -177,7 +177,9 @@ It will be installed along with ESMValCore and can also be viewed on GitHub:
 <https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/config-developer.yml>`_.
 This configuration file describes the file system structure and CMOR tables for several
 key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ,
-ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid
+ETHZ, SMHI, BSC), and for native output data for some
+models (IPSL, ... see :ref:`configure_native_models`).
+CMIP data is stored as part of the Earth System Grid
 Federation (ESGF) and the standards for file naming and paths to files are set
 out by CMOR and DRS. For a detailed description of these standards and their
 adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we
@@ -289,6 +291,48 @@ related to CMOR table settings available:
   to get the name of the file containing the ``mip`` table.
   Defaults to the value provided in ``cmor_type``.
 
+.. _configure_native_models:
+
+Configuring native models and observation data sets
+----------------------------------------------------
+
+ESMValCore can be configured for handling native model output formats
+and specific
+observation data sets without preliminary reformatting. You can choose
+to host this new data source either under a dedicated project or under
+project ``native6``; when choosing the latter, such a configuration
+involves the following steps:
+
+  - allowing for ESMValTool to locate the data files:
+
+    - entry ``native6`` of ``config-developer.yml`` should be
+      complemented with sub-entries for ``input_dir`` and ``input_file``
+      that goes under a new key representing the
+      data organization (such as ``MY_DATA_ORG``), and these sub-entries can
+      use an arbitrary list of ``{placeholders}``. Example :
+
+      .. code-block:: yaml
+
+        native6:
+          ...
+          input_dir:
+             default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
+             MY_DATA_ORG: '{model}/{exp}/{simulation}/{version}/{type}'
+          input_file:
+            default: '*.nc'
+            MY_DATA_ORG: '{simulation}_*.nc'
+          ...
+
+    - if necessary, provide a so-called ``extra facets file`` which
+      allows to cope e.g. with variable naming issues for finding
+      files. See :ref:`extra_facets` and :download:`this example of
+      such a file for IPSL-CM6
+      <../../esmvalcore/_config/extra_facets/ipslcm-mappings.yml>`.
+
+  - ensuring that ESMValCore get the right metadata and data out of
+    your data files: this is described in :ref:`fixing_data`
+
+
 .. _config-ref:
 
 References configuration file
 
@@ -1,7 +1,7 @@
 .. _findingdata:
 
 ************
-Finding data
+Input data
 ************
 
 Overview
@@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and
 the input the user needs to specify, giving examples on how to use the data
 finding routine under different scenarios.
 
+Data types
+==========
+
 .. _CMOR-DRS:
 
-CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF
-=========================================================
+CMIP data
+---------
 CMIP data is widely available via the Earth System Grid Federation
 (`ESGF <https://esgf.llnl.gov/>`_) and is accessible to users either
 via download from the ESGF portal or through the ESGF data nodes hosted
@@ -45,6 +48,40 @@ From the ESMValTool user perspective the number of data input parameters is
 optimized to allow for ease of use. We detail this procedure in the next
 section.
 
+Native model data
+-----------------
+Support for native model data that is not formatted according to a CMIP
+data request is quite easy using basic
+:ref:`ESMValCore fix procedure <fixing_data>` and has been implemented
+for some models :ref:`as described here <fixing_native_models>`
+
+Observational data
+------------------
+Part of observational data is retrieved in the same manner as CMIP data, for example
+using the ``OBS`` root path set to:
+
+  .. code-block:: yaml
+
+    OBS: /gws/nopw/j04/esmeval/obsdata-v2
+
+and the dataset:
+
+  .. code-block:: yaml
+
+    - {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
+
+in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
+CMOR-DRS_ are used again and the file will be automatically found:
+
+.. code-block::
+
+  /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
+
+Since observational data are organized in Tiers depending on their level of
+public availability, the ``default`` directory must be structured accordingly
+with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
+``drs: default``.
+
 .. _data-retrieval:
 
 Data retrieval
@@ -186,8 +223,8 @@ datasets are listed in any recipe, under either the ``datasets`` and/or
 .. code-block:: yaml
 
   datasets:
-    - {dataset: HadGEM2-CC,  project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
-    - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004,  end_year: 2014}
+    - {dataset: HadGEM2-CC, project: CMIP5, exp: historical, ensemble: r1i1p1, start_year: 2001, end_year: 2004}
+    - {dataset: UKESM1-0-LL, project: CMIP6, exp: historical, ensemble: r1i1p1f2, grid: gn, start_year: 2004, end_year: 2014}
 
 ``_data_finder`` will use this information to find data for **all** the variables specified in ``diagnostics/variables``.
 
@@ -208,7 +245,7 @@ and the dataset you need is specified in your ``recipe.yml`` as:
 
 .. code-block:: yaml
 
-  - {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004,  end_year: 2014}
+  - {dataset: UKESM1-0-LL, project: CMIP6, mip: Amon, exp: historical, grid: gn, ensemble: r1i1p1f2, start_year: 2004, end_year: 2014}
 
 for a variable, e.g.:
 
@@ -244,32 +281,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file:
 
 .. _observations:
 
-Observational data
-==================
-Observational data is retrieved in the same manner as CMIP data, for example
-using the ``OBS`` root path set to:
-
-  .. code-block:: yaml
-
-    OBS: /gws/nopw/j04/esmeval/obsdata-v2
-
-and the dataset:
-
-  .. code-block:: yaml
-
-    - {dataset: ERA-Interim,  project: OBS,  type: reanaly,  version: 1,  start_year: 2014,  end_year: 2015,  tier: 3}
-
-in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
-CMOR-DRS_ are used again and the file will be automatically found:
-
-.. code-block::
-
-  /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
-
-Since observational data are organized in Tiers depending on their level of
-public availability, the ``default`` directory must be structured accordingly
-with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
-``drs: default``.
 
 Data loading
 ============
 
@@ -6,7 +6,7 @@ Getting started
 
 		Installation <install>
     Configuration <configure>
-    Finding data <find_data>
+    Input data <find_data>
     Installed recipes <recipes>
 		Running <run>
 		Output <output>