|
1 | 1 | # Snakemake workflow: `<name>`
|
2 | 2 |
|
3 |
| -[](https://snakemake.github.io) |
4 |
| -[](https://github.com/<owner>/<repo>/actions?query=branch%3Amain+workflow%3ATests) |
5 |
| - |
| 3 | +[](https://snakemake.github.io) |
| 4 | +[](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml) |
| 5 | +[](https://docs.conda.io/en/latest/) |
| 6 | +[](https://sylabs.io/docs/) |
| 7 | +[](https://snakemake.github.io/snakemake-workflow-catalog) |
6 | 8 |
|
7 | 9 | A Snakemake workflow for `<description>`
|
8 | 10 |
|
| 11 | +- [Snakemake workflow: `<name>`](#snakemake-workflow-name) |
| 12 | + - [Usage](#usage) |
| 13 | + - [Workflow overview](#workflow-overview) |
| 14 | + - [Running the workflow](#running-the-workflow) |
| 15 | + - [Input data](#input-data) |
| 16 | + - [Execution](#execution) |
| 17 | + - [Parameters](#parameters) |
| 18 | + - [Authors](#authors) |
| 19 | + - [References](#references) |
| 20 | + - [TODO](#todo) |
9 | 21 |
|
10 | 22 | ## Usage
|
11 | 23 |
|
12 | 24 | The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=<owner>%2F<repo>).
|
13 | 25 |
|
14 |
| -If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) <repo>sitory and its DOI (see above). |
| 26 | +If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI. |
| 27 | + |
| 28 | +## Workflow overview |
| 29 | + |
| 30 | +This workflow is a best-practice workflow for `<detailed description>`. |
| 31 | +The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps: |
| 32 | + |
| 33 | +1. Parse sample sheet containing sample meta data (`python`) |
| 34 | +2. Simulate short read sequencing data on the fly (`dwgsim`) |
| 35 | +3. Check quality of input read data (`FastQC`) |
| 36 | +4. Trim adapters from input data (`cutadapt`) |
| 37 | +5. Collect statistics from tool output (`MultiQC`) |
| 38 | + |
| 39 | +## Running the workflow |
| 40 | + |
| 41 | +### Input data |
| 42 | + |
| 43 | +This template workflow creates artificial sequencing data in `*.fastq.gz` format. It does not contain actual input data. The simulated input files are nevertheless created based on a mandatory table linked in the `config.yml` file (default: `.test/samples.tsv`). The sample sheet has the following layout: |
| 44 | + |
| 45 | +| sample | condition | replicate | read1 | read2 | |
| 46 | +| ------- | --------- | --------- | -------------------------- | -------------------------- | |
| 47 | +| sample1 | wild_type | 1 | sample1.bwa.read1.fastq.gz | sample1.bwa.read2.fastq.gz | |
| 48 | +| sample2 | wild_type | 2 | sample2.bwa.read1.fastq.gz | sample2.bwa.read2.fastq.gz | |
| 49 | + |
| 50 | + |
| 51 | +### Execution |
| 52 | + |
| 53 | +To run the workflow from command line, change the working directory. |
| 54 | + |
| 55 | +```bash |
| 56 | +cd path/to/snakemake-workflow-name |
| 57 | +``` |
| 58 | + |
| 59 | +Adjust options in the default config file `config/config.yml`. |
| 60 | +Before running the entire workflow, you can perform a dry run using: |
| 61 | + |
| 62 | +```bash |
| 63 | +snakemake --dry-run |
| 64 | +``` |
| 65 | + |
| 66 | +To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory. |
| 67 | + |
| 68 | +```bash |
| 69 | +snakemake --cores 3 --sdm conda --directory .test |
| 70 | +``` |
| 71 | + |
| 72 | +To run the workflow with **singularity** / **apptainer**, add a link to a container registry in the `Snakefile`, for example: |
| 73 | +`container: "oras://ghcr.io/<user>/<repository>:<version>"` for Github's container registry. Run the workflow with: |
| 74 | + |
| 75 | +```bash |
| 76 | +snakemake --cores 3 --sdm conda apptainer --directory .test |
| 77 | +``` |
| 78 | + |
| 79 | +### Parameters |
| 80 | + |
| 81 | +This table lists all parameters that can be used to run the workflow. |
| 82 | + |
| 83 | +| parameter | type | details | default | |
| 84 | +| ------------------ | ---- | --------------------------------------- | --------------------------------------------- | |
| 85 | +| **samplesheet** | | | | |
| 86 | +| path | str | path to samplesheet, mandatory | "config/samples.tsv" | |
| 87 | +| **get_genome** | | | | |
| 88 | +| database | str | one of `manual`, `ncbi` | `ncbi` | |
| 89 | +| assembly | str | RefSeq ID | `GCF_000006785.2` | |
| 90 | +| fasta | str | optional path to fasta file | Null | |
| 91 | +| gff | str | optional path to gff file | Null | |
| 92 | +| gff_source_type | str | list of name/value pairs for GFF source | see config file | |
| 93 | +| **simulate_reads** | | | | |
| 94 | +| read_length | num | length of target reads in bp | 100 | |
| 95 | +| read_number | num | number of total reads to be simulated | 100000 | |
| 96 | +| random_freq | num | frequency of random read sequences | 0.01 | |
| 97 | +| **cutadapt** | | | | |
| 98 | +| threep_adapter | str | sequence of the 3' adapter | `-a ATCGTAGATCGG` | |
| 99 | +| fivep_adapter | str | sequence of the 5' adapter | `-A GATGGCGATAGG` | |
| 100 | +| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 25 `, `-M 100`, `--overlap=5`] | |
| 101 | +| **multiqc** | | | | |
| 102 | +| config | str | path to multiQC config | `config/multiqc_config.yml` | |
| 103 | + |
| 104 | +## Authors |
| 105 | + |
| 106 | +- Firstname Lastname |
| 107 | + - Affiliation |
| 108 | + - ORCID profile |
| 109 | + - home page |
| 110 | + |
| 111 | +## References |
| 112 | + |
| 113 | +> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. *Sustainable data analysis with Snakemake*. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2. |
15 | 114 |
|
16 |
| -# TODO |
| 115 | +## TODO |
17 | 116 |
|
18 | 117 | * Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization.
|
19 | 118 | * Replace `<name>` with the workflow name (can be the same as `<repo>`).
|
20 | 119 | * Replace `<description>` with a description of what the workflow does.
|
| 120 | +* Update the workflow description, parameters, running options, authors and references in the `README.md` |
| 121 | +* Update the `README.md` badges. Add or remove badges for `conda`/`singularity`/`apptainer` usage depending on the workflow's capability |
21 | 122 | * The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if `<owner>` and `<repo>` were correctly set.
|
0 commit comments