-
Notifications
You must be signed in to change notification settings - Fork 0
mattheatley/ngs_pipe
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
READ ME This software helps facilitate the processing of ngs data on a SLURM cluster according to parameters commonly used by Levi Yant's lab. Requirements - python 3.7.3; os, sys, operator, time, argparse, subprocess, re, collections, statistics, itertools - trimmomatic 0.39 - picard 2.21.1 - bwa 0.7.17 - samtools 1.9 - gatk4 4.1.4.0 - vcftools 0.1.16 N.B. If software is installed within a conda environment the environment name must be supplied at each stage (see below). SETUP Create the initial default/user-specified working & sub-directories. python pipe_ngs.py -setup [-p </path/to>] [-d <directory_name>] PREPARE FILES Move reference genome & single/paired-end fastqs to the relevant sub-directories. i.e. /ref.gen genome.fasta /00.fastqs /Sample_1 Reads_1.fastq.gz Reads_2.fastq.gz /Sample_2 Reads.fastq.gz ... N.B. Read type is automatically assigned depending on whether or not ALL fastqs can be assigned a pair member. At present only novogene reads have been tested but similar naming conventions should also work. INDEX Index reference genome. python pipe_ngs.py -index PROCESS DATA Sequentially processes ngs data. python pipe_ngs.py -stage <number> [-x <ploidy>] Stage Description 1 Merge fastqs 2 Trim reads 3 Map reads 4 Mark duplicates 5 Call haplotypes 6 Combine sample 7 Genotype samples 8 Re-combine parts 9 Filter sites 10 Filter depth N.B. Single/paired-end reads will be processed according to inferred read type between stages 1-3. Ploidy only needs to be specified at stage 5. ADDITIONAL SETTINGS Flag Default Description -pa <partition> stage-specific specify SLURM partition -no <nodes> stage-specific specify SLURM nodes -nt <ntasks> stage-specific specify SLURM ntasks -me <memory[units]> stage-specific specify SLURM memory -wt <HH:MM:SS> stage-specific specify SLURM walltime -d <diretory_name> ngs_pipe specify working directory name -p </path/to> home path specify working directory path -u <user_name> current user specify SLURM user name -m <email_address> None specify email address to receive SLURM notifications -e <environment_name> ngs_env specify conda environment with software installed -l <limit> 100 specify concurrent task submission limit -review review stage SLURM accounting data i.e. -stage <number> -review -pipe submit pipeline itself; submits new tasks up to the limit from the hpcc -sb activate sub buddy; automatically re-submits any stalled tasks
About
A pipeline to process NGS data on a SLURM cluster according to GATK best practice
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published