Skip to content

Sustainability & Reproducibility

Stephan Reichl edited this page Nov 6, 2024 · 6 revisions

To ensure sustainable development, implicit documentation, and reproducibility each {module} has to fulfill the following requirements/specifications:

  • GitHub repository for development and version control.
    • Descriptive name (i.e., what it does and its purpose e.g, dea_limma) using Snakecase i.e., split by underscores _.
    • README according to the provided template.
    • Repository structure according to Snakemake's best practice.
    • Releases (i.e., versions) according to the semantic versioning scheme.
    • Workflow rulegraph in workflow/dags/rulegraph.svg
      snakemake --rulegraph --forceall | dot -Tsvg > workflow/dags/rulegraph.svg
    • GitHub page displaying the README.
    • LICENSE file (recommendation: MIT).
    • CITATION.cff file.
    • (Optional, but recommended) Add example data and configurations for users as a starting point.
    • (Optional, but recommended) Provide resources and/or external data sources (e.g., reference data) as links, or Zenodo, or Git Large File Storage.
  • Zenodo repository to ensure compatibility, citability, and long-term archiving.
    • Via automated GitHub hook.
    • Every GitHub release will trigger the creation of a new release in the Zenodo repository, and thereby a new version-specific DOI.
    • The Zenodo repository will be annotated using the provided information in the CITATION.cff file in your GitHub repository.
    • There is one permanent DOI that can be used to reference/cite all releases/versions of a given repository. We recommend using this DOI and the release version for referencing e.g., in publications.
    • Add the version-specific DOI badge to the top of the GitHub repository.
    • Add the permanent project DOI to the README in the introduction, the methods, at the bottom (Zenodo link), and to the CITATION.cff.
  • Snakemake Workflow Catalog entry to increase visibility and findability.
    • By fulfilling the requirements for Standardized Usage, the workflow will be automatically indexed.
    • Every GitHub release will trigger the catalog entry to be updated.
  • Snakemake Report for implicit documentation and presentation of results.
    • Follow the specified Report structure to enhance reproducibility (via export of used software and configuration) and to ensure module compatibility.
  • Result directory
    • Follow the specified Result structure to enhance reproducibility (via export of used software and configuration) and to ensure module compatibility.
  • Software Management with conda for reproducibility and portability.
    • Specify the exact version of every entry in your conda environment specification files (workflow/envs/*.yaml).
    • For maximal compatibility define your global workflow dependencies in workflow/envs/global.yaml containing all required software for the execution of the Snakefile.
  • Workflow specific profile
    • Provide a workflow specific profile in workflow/profiles/default/config.yaml for workflow-specific parameters or resources.
  • Use the min_version directive in your Snakefile
    ##### set minimum snakemake version #####
    min_version("8.20.1")
  • (COMING SOON) Containerization with Docker/Singularity for OS-level virtualization.
    • This final virtualization frontier will be explored and implemented across all MrBiomics modules in the future.
    • Automated containerization has been supported since Snakemake 6.0.0 (released 2021-02-26).
  • Add the {module} to the summary table with all modules in this repository's README under Modules.

Checklist

  • GitHub repository with README, LICENSE, CITATION.cff, Snakemake Workflow Catalog entry, and conda YAMLspecifications with exact versions.
  • Zenodo repository via GitHub webhook.
  • GitHub release to trigger Zenodo DOI generation.
  • Add general and version-specific DOI to the GitHub README, CITATION.cff, and MrBiomics Modules.
  • Final GitHub release with minor version bump including generated DOI.