Skip to content

Commit

Permalink
Provenance human health and clinical data and update tools file (#369)
Browse files Browse the repository at this point in the history
* Add provenance tools to tool_and_resource_list.yml

* Update human-clinical-and-health-data.md
  • Loading branch information
EvaGarciaAlvarez authored Oct 29, 2024
1 parent f6248b7 commit caa0760
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 24 deletions.
36 changes: 36 additions & 0 deletions _data/tool_and_resource_list.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1252,3 +1252,39 @@
id: sdmx-ri
name: SDMX-Reference Infrastructure (SDMX-RI)
url: https://sdmx.org/?page_id=4666
- description: Java implementation of the PROV data model.
id: provtoolbox
name: ProvToolbox
url: https://github.com/lucmoreau/ProvToolbox
- description: Python implementation of the PROV data model.
id: provpython
name: Prov python
url: https://pypi.org/project/prov/
- description: Set of user-friendly web applications for storing, validating, and translating W3C PROV-based provenance representations.
id: openprovenance
name: OpenProvenance
url: https://openprovenance.org/
- description: A prototype of a provenance management service implementing the CPM (ISO 23494-2).
id: provenance-storage
name: Provenance storage
url: https://is.muni.cz/th/mo8f1/
- description: Collect meta-data from scripts written in the R programming language.
id: provr
name: provR
url: https://github.com/ProvTools/provR
- description: An R library to collect provenance from R scripts.
id: rdatatracker
name: RDataTracker
url: https://github.com/End-to-end-provenance/RDataTracker
- description: The noWorkflow project aims at allowing scientists to benefit from provenance data analysis even when they don't use a workflow system.
id: noworkflow
name: noWorkFlow
url: https://github.com/gems-uff/noworkflow
- description: Provenance tracking for R
id: recordr
name: recordr
url: https://github.com/NCEAS/recordr
- description: CamFlow is a Linux Security Module (LSM) designed to capture data provenance for the purpose of system audit
id: camflow
name: camflow
url: https://camflow.org/
41 changes: 17 additions & 24 deletions provenance/human-clinical-and-health-data.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Human clinical and health data
description: Tracking data and analysis steps.
contributors: [Rudolf Wittner, Stian Soiland-Reyes, Simone Leo]
contributors: [Rudolf Wittner]
page_id: hchd_provenance
redirect_from: /human-clinical-and-health-data/provenance
rdmkit:
Expand All @@ -11,15 +11,18 @@ training:
- name: FAIR data and provenance with RO-Crate and Galaxy
registry:
url: https://gallantries.github.io/video-library/modules/ro-crate
- name: Galaxy trainig
registry:
url: https://bio.tools/galaxy
- name: Common Workflow Language (CWL) trainig
registry:
url: https://fairsharing.org/FAIRsharing.8y5ayx
- name: Snakemake trainig
registry:
url: https://tess.elixir-europe.org/search?q=Snakemake
# More information on how to fill in this metadata section can be found here https://www.infectious-diseases-toolkit.org/contribute/page-metadata
---

## W3C PROV

[W3C PROV](https://www.w3.org/TR/prov-overview/) is a general purpose standard for provenance information. The standard suggests expression of provenance in terms of entities, activities, agents, and their mutual relations. The standard's data model is realized in different serializations, including the [PROV-O ontology](https://www.w3.org/TR/prov-o/), which have been extended for various domains.

In addition to the [PROV primer](https://www.w3.org/TR/prov-primer/), the [PROV Book](https://www.provbook.org/) gives a detailed introduction to using PROV.

## HL7 FHIR Provenance

[HL7 FHIR](http://hl7.org/fhir/) is an interoperability standard for healthcare information exchange between systems. FHIR aims to define the key entities involved in healthcare information exchange as resources.
Expand All @@ -28,20 +31,10 @@ FHIR provides support for [expression of provenance](https://www.hl7.org/fhir/pr

The provenance part of HL7 FHIR extends W3C PROV.

## The Common Provenance Model

The [Common Provenance Model](https://doi.org/10.1038/s41597-022-01537-6) (CPM) is an extension of W3C PROV that aims to provide support for the integration of provenance information from heterogeneous environments. In particular, it provides guidelines for the representation of domain-independent provenance information (provenance _backbone_), to which domain-specific provenance information can be attached in a prescribed way.

The CPM forms a conceptual foundation for the ISO standard series _ISO 23494 Provenance information model for biological specimen and data_. The ISO standard is still in an early phase of its development.

## RO-Crate

{% tool "research-object-crate" %} is a lightweight implementation of a _FAIR Digital Object_, which is able to pack data together with its metadata into a _Research Object_. It is based on Linked Data standards including {% tool "schema-org" %} and [JSON-LD](https://json-ld.org/), but can be written and consumed as regular JSON.

The [RO-Crate specifications](https://www.researchobject.org/ro-crate/specification.html) can be used to form different [RO-Crate profiles](https://www.researchobject.org/ro-crate/profiles.html), which are suitable for various domains and use cases. While the base specifications already contain some [guidelines on representing the provenance of data entities](https://www.researchobject.org/ro-crate/1.1/provenance.html#software-used-to-create-files) included in the crate, some contexts require a more detailed description to enhance traceability and reproducibility. To meet this demand, several provenance-oriented RO-Crate profiles are being developed:

* The [Workflow Run RO-Crate working group](https://www.researchobject.org/workflow-run-crate/) is developing a collection of [profiles to describe the execution of computational workflows](https://www.researchobject.org/workflow-run-crate/profiles/). The profiles define provenance descriptions at different granularity levels, from "black box" (only workflow-level inputs, outputs and parameters are considered) to step-by-step rundown.

* The CPM team, with the help of the RO-Crate community, is developing an RO-Crate profile for representing CPM-compliant provenance and meta-provenance in an RO-Crate.

Support for RO-Crate provenance reporting is being added or is planned to be added to several workflow engines, including {% tool "galaxy" %}, {% tool "common-workflow-language" %}, {% tool "snakemake" %}, {% tool "streamflow" %}, {% tool "sapporo-wes" %}, {% tool "compss" %}, {% tool "wfexs" %}.
## Relevant tools
* {% tool "galaxy" %}: Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.
* {% tool "common-workflow-language" %}: An open standard for describing workflows that are build from command line tools
* {% tool "snakemake" %}: Snakemake is a framework for data analysis workflow execution
* {% tool "streamflow" %}: Container-native workflow manager for hybrid infrastructures
* {% tool "sapporo-wes" %}: Implementation of Workflow Execution Service (WES) or so-called Workflow-as-a-Service.
* {% tool "compss" %}: COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters.

0 comments on commit caa0760

Please sign in to comment.