Skip to content

Bioinformatics workflows developed for and used on the St. Jude Cloud project.

License

Notifications You must be signed in to change notification settings

stjudecloud/workflows

Repository files navigation


Build Status Documentation License: MIT

This repository contains all bioinformatics workflows used on the St. Jude Cloud project. Officially, the repository is in beta — the project is adding workflows as they are developed and put into production.

Resources requirements have been optimized to minimize failures in our computing environment, but they may not reflect the best settings for your use case. Please ensure that you tailor these parameters to fit your needs.

Please excuse the state of our documentation. We are working on some big changes around here, and with those changes will come much improved documentation.

Repository Structure

The repository is laid out as follows:

  • workflows/ - Directory containing all end-to-end bioinformatics workflows.
  • tools/ - All tools we have wrapped as individual WDL tasks.
  • data_structures/ - WDL struct definitions and tasks or workflows related to their construction, parsing, or validation.
  • docker/ - Dockerfiles used in our workflows. All docker images are published to the GitHub Container Registy as a part of our CI and are versioned.
  • tests/ - Home to all of our testing infrastructure. We use pytest-workflow for validating our code.
  • bin/ - no longer in use Scripts used by Cromwell configuration settings. Add this to $PATH prior to using configurations in conf with Cromwell.
  • conf/ - no longer in use Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements.

Bootstrap guide

This repository implements workflows using the Workflow Description Language (WDL). If unfamiliar with WDL, a short overview is available in the WDL spec.

The workflows and tasks in this repository should require minimal set-up and configuration before you're ready to run. You don't even need to clone the repo! The bare minimum requirements are a locally installed WDL runner and an internet connection.

The exact steps for installation, configuration, and execution are going to depend on you environment and preferred engine. There are a variety of WDL engines you could use, though our team prefers miniwdl. We also make use of the miniwdl-lsf plugin for running on our LSF cluster.

Most WDL runners are capable of running a WDL file from a URL. This is how we most commonly execute our workflows and tasks. The below command is a mock example of of what could be used to submit a run of our rnaseq-standard workflow using miniwdl:

miniwdl run --verbose --input inputs.json https://raw.githubusercontent.com/stjudecloud/workflows/rnaseq-standard/v3.0.1/workflows/rnaseq/rnaseq-standard.wdl

For an introduction to WDL, there are many guides, one of which is from Terra.

Author

👤 St. Jude Cloud Team

Tests

Every task in this repository is covered by at least one test (see all of our tests in tests/tools/). These are run using pytest-workflow.

The command for running our tests should be executed at the root of the repo: python -m pytest --kwdof --git-aware

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

Links worth checking out

The OpenWDL GitHub

Our preferred WDL runner: miniwdl

Most of our tasks are run inside a BioContainers image

Our tasks are validated using pytest-workflow

📝 License

Copyright © 2020-Present St. Jude Cloud Team.
This project is MIT licensed.