phoenix is a bioinformatics pipeline that ...
- Given a bam, unpacks the bam into fastqs
- Given xenografts, disambiguates between mouse and human reads
- If
skip_trimming
isfalse
(default), trims fastq reads throughtrimgalore
- Uses the typical alignment pipeline provided by
nf-core/subworkflows/fastq_align_bwa
, thenMarkDuplicates
- Read QC (
FastQC
) - Present QC for raw reads (
MultiQC
)
Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,is_pdx,fastq_1,fastq_2
CONTROL_REP1,true,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row represents a pair of fastq files (paired end).
Similarly, a samplesheet containing bam input data is accepted. It should look as follows:
samplesheet_bam.csv
:
sample,is_pdx,bam
CONTROL_REP1,true,my_data.bam
For use with MSKCC's JUNO cluster, set these environment variables:
export NXF_SINGULARITY_CACHEDIR=/juno/work/ci/singularity_cachedir_nxf
module load java/jdk-11.0.11
module load singularity/3.7.1
Now, you can run the pipeline using:
nextflow run main.nf \
-profile juno,singularity \
<--skip_trimming> \ # Omit if you want TrimGalore to run
<--input samplesheet.csv AND/OR --input_bam samplesheet_bam.csv> \
--outdir <OUTDIR>
Finally, edit conf/resources.config
to include the required reference genome and the directory of the corresponding bwa
index.
// conf/resources.config
params {
fasta_href = "/path/to/human/genome/genome.fa"
bwa_index_href = "/path/to/human/genome" // bwa index usually same location as genome.fa
fasta_mref = "/path/to/mouse/genome/genome.fa"
bwa_index_mref = "/path/to/mouse/genome" // bwa index usually same location as genome.fa
}
Now, you can run the pipeline using:
nextflow run main.nf \
-profile resources,<docker/singularity/.../institute> \
<--skip_trimming> \ # Omit if you want TrimGalore to run
<--input samplesheet.csv AND/OR --input_bam samplesheet_bam.csv> \
--outdir <OUTDIR>
NOTE: samplesheet.csv
can contain both pdx and non-pdx samples.
NOTE: You can use the input arguments --input
and --input_bam
individually or at the same time when running phoenix
.
Warning: Please provide pipeline parameters via the CLI or Nextflow
-params-file
option. Custom config files including those provided by the-c
Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details, please refer to the usage documentation and the parameter documentation.
All output paths relative to the OUTDIR directory defined by input parameter --outdir
.
Final bam outputs are placed in the subdirectory bam/
.
FASTQC and MULTIQC outputs are located in the fastqc
and multiqc
directories, respectively.
Intermediate outputs are placed in the subdirectory intermediate/
. These files should be safe to delete if -resume
is no longer needed.
Pipeline run information, such as the software versions used in the pipeline, can be found in pipeline_info/
.