ww-salmon RNA-seq Workflow

Overview

This workflow performs RNA-seq quantification using Salmon, a fast and accurate tool for transcript expression estimation. The workflow is designed to be simple to use while implementing best practices for RNA-seq analysis.

What this Workflow Does

Builds a Salmon Index from your reference transcriptome
Quantifies Transcripts for each of your RNA-seq samples
Generates Expression Matrices combining results from all samples

Requirements

Cromwell or another WDL-compatible workflow engine
Docker (the workflow uses the combinelab/salmon container)
Input files:
- Reference transcriptome (FASTA format)
- RNA-seq reads (FASTQ format, can be gzipped)

Quick Start

Download the WDL file from this repository:

Create an inputs JSON file (e.g., inputs.json):

{
  "SalmonRnaSeq.transcriptome_fasta": "path/to/transcriptome.fa",
  "SalmonRnaSeq.fastq_r1_files": [
    "path/to/sample1_R1.fastq.gz",
    "path/to/sample2_R1.fastq.gz"
  ],
  "SalmonRnaSeq.fastq_r2_files": [
    "path/to/sample1_R2.fastq.gz",
    "path/to/sample2_R2.fastq.gz"
  ]
}

Run the workflow with Cromwell:

java -jar cromwell.jar run salmon_rnaseq.wdl -i inputs.json

Input Parameters

Parameter	Description	Required?
`transcriptome_fasta`	Reference transcriptome in FASTA format	Yes
`fastq_r1_files`	Array of FASTQ files for read 1 (or single-end reads)	Yes
`fastq_r2_files`	Array of FASTQ files for read 2 (for paired-end data)	No
`salmon_docker`	Docker image for Salmon (default: "combinelab/salmon:latest")	No

Outputs

Output	Description
`salmon_index_tar`	Compressed Salmon index (can be reused for future analyses)
`salmon_quant_dirs`	Compressed quantification results for each sample
`merged_tpm_matrix`	Combined TPM values matrix for all samples
`merged_counts_matrix`	Combined read counts matrix for all samples

Examples

For Paired-End Data

{
  "SalmonRnaSeq.transcriptome_fasta": "references/gencode.v38.transcripts.fa",
  "SalmonRnaSeq.fastq_r1_files": [
    "samples/sample1_R1.fastq.gz",
    "samples/sample2_R1.fastq.gz"
  ],
  "SalmonRnaSeq.fastq_r2_files": [
    "samples/sample1_R2.fastq.gz", 
    "samples/sample2_R2.fastq.gz"
  ]
}

For Single-End Data

{
  "SalmonRnaSeq.transcriptome_fasta": "references/gencode.v38.transcripts.fa",
  "SalmonRnaSeq.fastq_r1_files": [
    "samples/sample1.fastq.gz",
    "samples/sample2.fastq.gz"
  ]
}

Common Questions

How do I get a transcriptome file?

You can download reference transcriptomes from:

GENCODE (human/mouse)
Ensembl (many species)
UCSC Genome Browser

For human, a common choice is the GENCODE reference:

wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.transcripts.fa.gz
gunzip gencode.v38.transcripts.fa.gz

How do I extract the results?

The workflow provides compressed output directories for each sample. To extract a specific sample's results:

tar -xzf sample1_quant.tar.gz

This will create a directory sample1_quant containing Salmon's output files, including:

quant.sf: The main quantification results file
logs/: Directory containing Salmon log files
lib_format_counts.json: Information about the library type

What are TPM and counts?

TPM (Transcripts Per Million): Normalized expression values suitable for comparing expression levels between samples
counts: Estimated number of fragments/reads from each transcript, suitable for differential expression analysis

Under the Hood

This workflow:

Creates a Salmon index from your transcriptome
Processes each sample with optimal settings:
- Automatic library type detection
- GC bias correction
- Sequence-specific bias correction
- Mapping validation
Combines results into unified matrices with properly labeled sample names

Troubleshooting

Error: "Docker image not found"

Solution: Ensure Docker is installed and running

Error: "File not found"

Solution: Check the paths in your inputs.json file

Error: "Memory allocation failed"

Solution: Adjust the memory_gb parameters in the WDL file

Advanced Customization

If you need to modify the workflow for advanced settings:

Edit the runtime parameters at the task level:

runtime {
    docker: docker_image
    memory: "~{memory_gb} GB"
    cpu: cpu
    disks: "local-disk ~{disk_size_gb} SSD"
    preemptible: 1
}

Add additional Salmon parameters in the command sections if needed

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
ww-salmon-inputs.json		ww-salmon-inputs.json
ww-salmon-options.json		ww-salmon-options.json
ww-salmon.wdl		ww-salmon.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ww-salmon RNA-seq Workflow

Overview

What this Workflow Does

Requirements

Quick Start

Input Parameters

Outputs

Examples

For Paired-End Data

For Single-End Data

Common Questions

How do I get a transcriptome file?

How do I extract the results?

What are TPM and counts?

Under the Hood

Troubleshooting

Advanced Customization

Need Additional Help?

About

Releases

Packages

Languages

License

getwilds/ww-salmon

Folders and files

Latest commit

History

Repository files navigation

ww-salmon RNA-seq Workflow

Overview

What this Workflow Does

Requirements

Quick Start

Input Parameters

Outputs

Examples

For Paired-End Data

For Single-End Data

Common Questions

How do I get a transcriptome file?

How do I extract the results?

What are TPM and counts?

Under the Hood

Troubleshooting

Advanced Customization

Need Additional Help?

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages