Skip to content

Latest commit

 

History

History
127 lines (64 loc) · 10.6 KB

README.md

File metadata and controls

127 lines (64 loc) · 10.6 KB

ECOPOTENTIAL IRIS SDM model/service

Authors: João Gonçalves, Salvador Arenas-Castro & João Honrado

Porto, 2018

Description

Main objectives of the IRIS SDM service

IRIS-SDM stands for "Infrastructure for Running, Inspecting and Summarizing Species Distribution Models". This ECOPOTENTIAL VirtualLab service implements a ‘generic’ version of R's biomod2 package for ensemble Species Distribution Modelling (SDM) (Thuiller et al. 2009, Thuiller et al. 2014) used to obtain spatiotemporal predictions of species distributions or habitat suitability. The service aims at generality so that any species can be modelled, based on any set of input variables, parameter settings and/or study areas. This is possible due to IRIS SDM's generic code structures. The Docker container used to run IRIS SDM can be found at this link.

The demo version of IRIS SDM

The demonstration (demo version) of IRIS SDM allows predicting the spatial distribution of suitable habitat conditions for a narrowly distributed, endemic and endangered plant species named Iris boissieri (Gerês lily or 'Yellowbeard Iris') based on satellite-derived Ecosystem Functional Attributes (EFAs). The study area encompasses the species entire distribution located in the NW portion of the Iberian Peninsula. It is based on pre-existing in-field records for the species presence as well as on a set of predictors from satellite-derived Ecosystem Functioning Attributes, namely the annual dynamics (average for the period 2001-2016) of three components related to carbon gains (EVI), sensible heat (LST) and radiative balance (albedo). For more details about the modelling approach check the paper by Arenas-Castro et al. (2018).

Input Data and Format

IRIS SDM requires three generic inputs for running:

  1. a csv file named input_records.csv containing presence-only records for the target species. The two first columns of this file contain the X and Y coordinates in the same coordinate reference system of the input raster predictor variables in vars.zip (see below);

  2. a csv file named parameters.csv containing a set of (hyper)parameter values for running the R biomod2 package ensemble modelling approach;

  3. a zip file named vars.zip containing a set of GeoTIFF raster files used as predictor variables and therefore with a known linkage to the target species' distribution and habitat suitability. These data must have the same coordinate reference system of the input presence records.

For more information on how to prepare the data and the parameter file, check the demo version data at this link.

EO data usage in the demo version: MODIS derived Ecosystem Functional Attributes (EFA) calculated from time series of spectral vegetation indices (MOD13Q1, MOD09A1), land surface temperature (MOD11A1) and albedo (MCD43A1, MCD43B3).

Other data sources in the demo version: Species occurrence data for the demo version were collected from pre-existing datasets and/or from dedicated in-field surveys; the algorithm runs with confirmed presence records complemented with pseudo-absences randomly generated by biomod2. However, confirmed absence records may also be used for model calibration. Climatic/bioclimatic data extracted from WorldClim and land cover data from Corine Land Cover (CLC) were also tested/used for comparative purposes in Arenas-Castro et al. (2018) but are not included in the demo version - these are optional inputs because the model can fiited exclusively on EO variables.

Output Data and Format

Outputs generated by the service are model-based, spatially-explicit predictions for the distribution of the target species in raster format and GeoTIFF format. Outputs also include tables (in .csv format) containing a summary of model performances for partial and ensemble models as well as an importance ranking for each predictor variable.

Outputs include the following files:

  • Output log file (plain txt): this file includes log information about the model/service run namely a list of input files, input data summaries and the progress messages from the biomod2 package. This file allows checking if the model was sucessfully run or if any problem occurred at run time;

  • Main output archive (zip file): this file includes the bulk of output data produced by the service and more specifically by biomod2 package. This includes files with fitted models, raster projections for partial and the final ensemble model and summary tables with model performances and variable importance rankings.

Running the demo version

  • Go to ECOP's VirtualLab site (here);

  • Start by checking if the API key is correctly inserted:

ECOP VLab API Key

The input interface should look like this:

VLab IRIS SDM inputs

Then press "RUN" to initiate the model. Use option "CHECK RUN" to verify the status of the service.

Running the service with other settings

  • Upload the input data (raster covariates zip file, presence records and the parameter file both in csv file) to an accessible web-based cloud service, HTTP or FTP server;

  • Get the correct URLs to each input file required by IRIS SDM;

  • Go to ECOP's VirtualLab site (here);

  • Check if the API key is correctly inserted (otherwise it is not possible to run the model);

  • Insert the URLs in the VLAb interface and make sure that all inputs are filled in;

  • Then press "RUN" to initiate the model. Use option "CHECK RUN" to verify the status of the service.

ECOPOTENTIAL products used

Currently, the demo version of the service uses several ECOPOTENTIAL products related to satellite-derived Ecosystem Functioning Attributes, namely the annual dynamics (average for the period 2001-2016) of three components linked to carbon gains (EVI), sensible heat (LST) and radiative balance (albedo) computed in the Google Earth Engine platform by project partners.

Validation

Model performance is evaluated by holdout cross-validation and through the calculation of several threshold-dependent metrics (e.g., Cohen Kappa, True skill-statistic) and threshold-independent metrics (Area Under the Receiver Operating Curve) calculated from the confusion matrix for the test dataset. Formal validation with independent data depends on in-field collection of occurrence data in relevant future dates, for which spatial projections of habitat suitability, can be produced. For more info check biomod2 package help files.

Transferability

The demo version of the service currently provides spatial predictions of habitat suitability of an endemic and narrow-ranged species (Iris boissieri), exclusive to NW Iberian Peninsula, for which spatial transferability is not required to other ECOPOTENTIAL sites. The demo service also provides spatial predictions for the entire species range (NW Iberia). Although issues related to the model temporal transferability for the demo species were not specifically addressed, it is reasonable to use the same model to make predictions to other (past or future) dates using the same set of predictors computed with the same methodology. However, it is reasonable to assume that uncertainty will increase as time goes further from the calibration interval (2001-2016).

Despite the specificities of modelling the species of the service's demo version, with its particular distribution range and ecological requirements, the service should be capable of providing relevant outputs for any other target species, different settings and/or environmental conditions.

Known limitations

  • At this stage, the model only provides predictions for calibration/fitting conditions. This means that making projections for other spatial and/or temporal conditions is not (yet) possible;

  • By default, IRIS SDM assumes that a presence-only approach is used (but users can select the mode of generating pseudo-absences);

  • QA/QC was not formally assessed.

Notes

  • The Docker container used to run IRIS SDM (here) currently runs R version 3.5.0 (2018-04-23) -- "Joy in Playing" / Platform: x86_64-pc-linux-gnu (64-bit);

  • This container currently enables running Maxent models, with maxent.jar located in the ./output folder (by default). For this reason, maxent.jar path should not be changed.

Programming language

R (mainly including the raster and biomod2 packages, along with their dependencies) and Linux bash.

Relevant references, materials or publications

Arenas-Castro, S., Gonçalves, J., Alves, P., Alcaraz-Segura, D., & Honrado, J. P. (2018). Assessing the multi-scale predictive ability of ecosystem functional attributes for species distribution modelling. PloS one, 13(6), e0199292.

Carvalho-Santos, C., Monteiro, A., Arenas-Castro, S., Greifeneder, F., Marcos, B., Portela, A., & Honrado, J. (2018). Ecosystem Services in a Protected Mountain Range of Portugal: Satellite-Based Products for State and Trend Analysis. Remote Sensing, 10(10), 1573.

Thuiller, W., Lafourcade, B., Engler, R., & Araújo, M. B. (2009). BIOMOD–a platform for ensemble forecasting of species distributions. Ecography, 32(3), 369-373.

Thuiller, W., Georges, D., Engler, R., 2014. biomod2: Ensemble platform for species distribution modeling, R package version 3.1-48, URL: http://CRAN.R-project.org/package=biomod2.

Intellectual Property Rights (IPR)

IPR are under the CC-BY-NC 2.0 license.

ECOPOTENTIAL Partner

ICETA / CIBIO / InBIO - Instituto de Ciências, Tecnologias e Agroambiente, University of Porto, Portugal link-1 link-2 link-3