Skip to content

mskcc/phoenix

Repository files navigation

[phoenix]

AWS CICite with Zenodo

Nextflow run with conda run with docker run with singularity

Introduction

phoenix is a bioinformatics pipeline that ...

  1. Given a bam, unpacks the bam into fastqs
  2. Given xenografts, disambiguates between mouse and human reads
  3. If skip_trimming is false (default), trims fastq reads through trimgalore
  4. Uses the typical alignment pipeline provided by nf-core/subworkflows/fastq_align_bwa, then MarkDuplicates
  5. Read QC (FastQC)
  6. Present QC for raw reads (MultiQC)

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,is_pdx,fastq_1,fastq_2
CONTROL_REP1,true,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz

Each row represents a pair of fastq files (paired end).

Similarly, a samplesheet containing bam input data is accepted. It should look as follows:

samplesheet_bam.csv:

sample,is_pdx,bam
CONTROL_REP1,true,my_data.bam

JUNO Config

For use with MSKCC's JUNO cluster, set these environment variables:

export NXF_SINGULARITY_CACHEDIR=/juno/work/ci/singularity_cachedir_nxf

module load java/jdk-11.0.11
module load singularity/3.7.1

Now, you can run the pipeline using:

nextflow run main.nf \
   -profile juno,singularity \
   <--skip_trimming> \  # Omit if you want TrimGalore to run
   <--input samplesheet.csv AND/OR --input_bam samplesheet_bam.csv> \
   --outdir <OUTDIR>

General Use

Finally, edit conf/resources.config to include the required reference genome and the directory of the corresponding bwa index.

// conf/resources.config
params {
    fasta_href = "/path/to/human/genome/genome.fa"
    bwa_index_href = "/path/to/human/genome"   // bwa index usually same location as genome.fa

    fasta_mref = "/path/to/mouse/genome/genome.fa"
    bwa_index_mref = "/path/to/mouse/genome"   // bwa index usually same location as genome.fa
}

Now, you can run the pipeline using:

nextflow run main.nf \
   -profile resources,<docker/singularity/.../institute> \
   <--skip_trimming> \  # Omit if you want TrimGalore to run
   <--input samplesheet.csv AND/OR --input_bam samplesheet_bam.csv> \
   --outdir <OUTDIR>

NOTE: samplesheet.csv can contain both pdx and non-pdx samples. NOTE: You can use the input arguments --input and --input_bam individually or at the same time when running phoenix.

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details, please refer to the usage documentation and the parameter documentation.

Pipeline output

All output paths relative to the OUTDIR directory defined by input parameter --outdir.

Final bam outputs are placed in the subdirectory bam/.

FASTQC and MULTIQC outputs are located in the fastqc and multiqc directories, respectively.

Intermediate outputs are placed in the subdirectory intermediate/. These files should be safe to delete if -resume is no longer needed.

Pipeline run information, such as the software versions used in the pipeline, can be found in pipeline_info/.