UC 5: Validation of Phytosociological Methods through Occurrence Cubes

About this Use Case

Use case 5 aims to refine and validate habitat prediction methods by incorporating species occurrence data from scientific, museum collections and citizen science sources, with environmental variables, through machine learning models. By integrating plant species occurrences from the Global Biodiversity Information Facility (GBIF) with climatic and topographic data from Earth Observation sources, UC5 seeks to improve habitat mapping accuracy within Europe and highlight the usefulness of data sources as GBIF. Our approach, built over the EUNIS habitat study case 'S22', focuses on comparing predicted species distributions with existing habitat prediction maps, particularly from the European Nature Information System (EUNIS), and exploring the effectiveness of data cubes in improving habitat classification accuracy. The ultimate goal is to improve our understanding of habitat dynamics and promote a more data-driven approach to habitat prediction, with applications in biodiversity monitoring, conservation planning, and museum collections management.

Research Questions

Which environmental factors influence the distribution and community formation of plant species in European habitats (Habitat case study S22)?
Can integrating species occurrence data from databases like GBIF, including scientific and citizen science sources, and environmental factors retrieved from EO sources improve habitat prediction accuracy, especially within the context of the EUNIS system?
How do the predictions of species distribution compare with the existing EUNIS habitat probability map, and where do discrepancies arise?

Methodology

Use Case 5 integrates species occurrence data, environmental predictors, and machine learning (ML) models to improve habitat prediction accuracy across Europe. The EUNIS habitat type S22 serves as the study case on which the entire modelling framework is built. To test our approach, we collected occurrence data from the Global Biodiversity Information Facility (GBIF) for the eight diagnostic species (including taxonomic synonyms) associated with habitat S22. These records were combined with a suite of topographic and climatic variables derived from Earth Observation sources—including elevation, slope, TRI, TWI, HLI, Aridity Index (AI), temperature, seasonal temperature, precipitation, and seasonal precipitation—sourced from platforms such as Copernicus, WorldClim, and CHELSA Bioclim. All datasets were cleaned, harmonised, and integrated into spatially aligned data cubes at two spatial resolutions: 1 km across Europe and 100 m for the Alps region. To enable supervised learning, pseudo-absence data were generated using a 1 km disk buffer method and combined with GBIF presence data.

Species distribution models (SDMs) were developed using an ensemble ML pipeline composed of Generalised Linear Models (GLM), Generalised Additive Models (GAM), and Random Forest (RF). These models produce species occurrence probability maps, which are then combined into a single ensemble prediction. Each model's contribution is weighted by its True Skill Statistic (TSS), calculated through 10-fold cross-validation, stratified by presence–absence data.

Model predictions were validated using the EUNIS-ESy habitat distribution maps derived from the EVA database, which served as an independent reference for habitat S22 at 1 km resolution. Lastly, UC5 results were compared to the official EUNIS probability map for the habitat S22 to assess the alignment and potential improvements offered by the species-based modelling approach. This two-tiered evaluation—against both observational (EVA) and modelled (EUNIS probability) references—provides insights into the accuracy and robustness of the predictions and highlights areas where the ensemble model may refine or complement existing habitat mapping efforts.

UC5 Outputs

UC5 outputs include:

Aggregated species distributions built over individual models (GLM, GAM, RF) and weighted through TSS values, combined in an ensemble model.
A comparative analysis of predicted species distributions against existing EUNIS habitat maps, with recommendations from the UC5 experience for improving habitat classification accuracy.
R scripts produced, which can be used for further research and decision-making in biodiversity conservation.

Getting Started with Our Approach

R scripts available in the UC5 GitHub directory 'Notebook':

Download and Clean GBIF Occurrence Data (GETDATA_GBIF.R)
Integrate GBIF occurrence data with environmental predictors (elevation, slope, TPI, TWI, HLI, AI, temperature, seasonal temperature, precipitation, seasonal precipitation) from EO sources at 1 km (Europe) and 100 m (Alps) resolutions (S1_data_100m.R, S1_data_1km.R)
Generate Pseudo-absence data based on GBIF presence only occurrence data, using a disk buffer method (1 km radius) (create_pseudo-absence_based_on_occurrence_presence_data.R)
Predict Species Distributions through an Ensemble model built over individual models (GLM, GAM, RF) weighted with TSS scores (Ensemble_model_workflow.R)
Compare the UC5 predicted areas with the EUNIS probability map of the Habitat S22 (compare_with_EUNIS.R)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Notebooks		Notebooks
user-friendly-script		user-friendly-script
.gitignore		.gitignore
Docs		Docs
LICENSE		LICENSE
Model_statistics_all_outputs.csv		Model_statistics_all_outputs.csv
README.md		README.md
S22_final_prediction_100m.tif		S22_final_prediction_100m.tif
S22_final_prediction_1km.tif		S22_final_prediction_1km.tif
S22_occurences.csv		S22_occurences.csv
S22df_datasource_grouped.csv		S22df_datasource_grouped.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UC 5: Validation of Phytosociological Methods through Occurrence Cubes

About this Use Case

Research Questions

Methodology

UC5 Outputs

Getting Started with Our Approach

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

FAIRiCUBE/uc5-occurence-cubes

Folders and files

Latest commit

History

Repository files navigation

UC 5: Validation of Phytosociological Methods through Occurrence Cubes

About this Use Case

Research Questions

Methodology

UC5 Outputs

Getting Started with Our Approach

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages