Skip to content

Pipeline for variant imputation in low depth sequencing data using GLIMPSE

License

Notifications You must be signed in to change notification settings

CERC-Genomic-Medicine/glimpse_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Genotype imputation using low depth sequencing data

1. Description

Pipeline for genotype imputation from low depth sequencing data using GLIMPSE1. This pipeline uses GATK HaplotypeCaller for estimating genotype probabilities at reference sites prior to imputation. The pipeline was developed using Nextflow and was tested on SLURM job scheduler.

2. Prerequisites

The following software is required:

  • Singularity
  • Nextflow
  • bcftools

3. Installation

To run this pipline you will need:

  1. Download and install GLIMPSE1
  2. Build GATK singularity container: singularity build gatk_VERSION.sif docker://broadinstitute/gatk:VERSION
  3. Clone this repo: git clone https://github.com/CERC-Genomic-Medicine/glimpse_pipeline.git

4. Execution

  1. Modify nextflow.config configuration file:
  • params.reference_vcfs -- path to VCF/BCF files with phased reference panel genotypes. Each VCF/BCF file must have the corresponding tbi/csi index.
  • params.reference_sites_vcfs -- path to sites-only VCF/BCF files of the reference panel. Each VCF/BCF file must have the corresponding tbi/csi index.
  • params.study_bams -- path to BAM/CRAM files. One BAM/CRAM file per study participant. Each BAM/CRAM file must have the corresponding bai/crai index.
  • params.referenceDir -- path to the folder with the reference genome *.fa file.
  • params.referenceGenome -- name of the reference genome *.fa file (e.g. hs37d5.fa).
  • params.gatkContainer -- path to the GATK singularity image file (.sif).
  • params.window_size -- imputation window size in base-pairs. This is a parameter to the GLIMPSE_chunk executable. See GLIMPSE1 documentation for more details.
  • params.buffer_size -- imputation window buffer size in base-pairs. This is a parameter to the GLIMPSE_chunk executable. See GLIMPSE1 documentation for more details.
  • params.chunk_exec -- path to the GLIMPSE_chunk executable.
  • params.phase_exec -- path to the GLIMPSE_phase executable.
  • params.ligate_exec -- path to the GLIMPSE_ligate executable.
  • params.glimpse_maps -- path to the GLIMPSE's genetic maps folder with the corresponding human genome build version.
  • process.* and executor.* -- set this arguments according to your compute cluster configuration.
  1. Run pipleine. Example of interactive SLURM job:
salloc --time=12:00:00 --ntasks=1 --mem-per-cpu=16G
module load nextflow
module load singularity
module load bcftools
nextflow run Imputation.nf

About

Pipeline for variant imputation in low depth sequencing data using GLIMPSE

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published