Skip to content

AndersenLab/isotype-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Docker (env/Dockerfile)

isotype-nf

The isotype-nf pipeline performs isotype group calls for wild isolate variant data at the strain level, and outputs isotype assignments and related information. Those isotypes can be used for downstream analysis including isotype reference variant calling, wi-gatk-nf (variant calling) and other analyses.

This page details how to run the pipeline. You can also find more information on the Andersen Lab dry guide.

Pipeline overview

_____   ______     ___    _________  ____  ____  _______  ________       ____  _____  ________  
|_   _|.' ____ \  .'   `. |  _   _  ||_  _||_  _||_   __ \|_   __  |     |_   \|_   _||_   __  | 
  | |  | (___ \_|/  .-.  \|_/ | | \_|  \ \  / /    | |__) | | |_ \_|______ |   \ | |    | |_ \_| 
  | |   _.____`. | |   | |    | |       \ \/ /     |  ___/  |  _| _|______|| |\ \| |    |  _|    
 _| |_ | \____) |\  `-'  /   _| |_      _|  |_    _| |_    _| |__/ |      _| |_\   |_  _| |_     
|_____| \______.' `.___.'   |_____|    |______|  |_____|  |________|     |_____|\____||_____|    
                                                                                               

To run the pipeline:

nextflow main.nf --help
nextflow main.nf --debug
nextflow main.nf --vcf_file=/path/to/vcf_file --species c_elegans -output-dir=/path/to/output
nextflow main.nf --vcf_file=/path/to/vcf_file --bam_location=/path/to/bams --previous_isotypes=/path/to/previous_isotype_file -output-dir=/path/to/output

    parameters                 description                           Set/Default
    ==========                 ===========                           ========================
    --debug                    Use --debug to indicate debug mode    false
    --vcf_file                 All strains VCF file                  null

    --species                  Species to call isotypes from         null
    and / or
    --cutoff                   Concordance cutoff for isotype calls  null
    --bam_location             Directory of BAM files                null
    --previous_isotypes        File containing previous isotypes     null
    
    username                                                         null

	HELP: http://andersenlab.org/dry-guide/pipelines/pipeline-isotype/

Software requirements

  • Nextflow v24+ (see the dry guide on Nextflow here or the Nextflow documentation here). On Rockfish, you can access this version by loading the nf24_env conda environment prior to running the pipeline command:
module load python/anaconda
source activate /data/eande106/software/conda_envs/nf24_env
  • Singularity. On Rockfish, you can get this with module load singularity before running

Usage

Testing on Rockfish

This command uses a test dataset

nextflow run -latest andersenlab/isotype-nf --debug

Running on Rockfish

You should run this in a screen or tmux session.

Note: if you are having issues running Nextflow or need reminders, check out the Nextflow page.

nextflow main.nf --vcf_file=/path/to/vcf_file --species c_elegans -output-dir=/path/to/output

or

nextflow main.nf --vcf_file=/path/to/vcf_file --bam_location=/path/to/bams --previous_isotypes=/path/to/previous_isotype_file -output-dir=/path/to/output

Parameters

-profile

There are three configuration profiles for this pipeline.

  • rockfish - Used for running on Rockfish (default)

Note

If you forget to add a -profile, the rockfish profile will be chosen as default

--vcf_file

The vcf file for isotype calling is the output from the wi-gatk pipeline. The vcf file must be gzipped, is the **full path to the vcf file ** (even if it is in your current directory)

Note

Remember that in --debug mode the pipeline will use the vcf file located in test_data/vcf.gz.

--debug (optional)

You should use --debug for testing/debugging purposes. This will run the debug test set (located in the test_data folder) using your specified configuration profile (e.g. rockfish).

For example:

nextflow run -latest andersenlab/isotype-nf --debug -resume

Using --debug will automatically set the vcf file to test_data/vcf.gz

--species (optional if --bam_location, --cutoff, and --previous_isotypes are specified)

Must be "c_elegans", "c_briggsae" or "c_tropicalis" to select automatically select cutoff, bam folder, and previous isotype groups. Manually specifying any of these will override the default path for a given species. If all of them are manually specified, --species will not affect the workflow.

--bam_location (optional if --species is specified)

Location of bam folder containing coverage records for each strain. Inferred from species if not specified and --species is specified

--previous_isotypes (optional if --species is specified)

Location of previous isotype group calls file. Inferred from species if not specified and --species is specified. This file should be tab-separated with at least 3 columns: strain, isotype, and isotype_ref_strain.

--cutoff (optional if --species is specified)

The minimum concordance required for isotype clustering. Inferred from species if not specified and --species is specified

-output-dir (optional)

default = WI-{today's date} where the date is formatted as YYYYMMDD

A directory in which to output results

Output

├── gtcheck.tsv
├── isotype_groups.tsv
├── wi_isotype_sample_sheets.txt
└── isotype_comparison.pdf

Most files should be obvious. A few are detailed below.

  • gtcheck.txt - Contains all of the pairwise genotype comparisons of valid genotypes for every strain.
  • isotype_groups.tsv - A tab-separated file with strains, their isotype group, and the isotype reference strain.
  • wi_isotype_sample_sheet.txt - A sample sheet for rerunning the wi-gatk pipeline for isotype reference strains only.
  • isotype_comparison.pdf - A set of plots showing joins and splits in the new isotype groups compared to the previous isotype groups.

Relevant Docker Images

  • andersenlab/numpy (link): Docker image is created within this pipeline using GitHub actions. Whenever a change is made to env/Dockerfile or .github/workflows/build_docker.yml GitHub actions will create a new docker image and push if successful

About

Nextflow pipeline for calling isotype groups

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published