Hail analysis of 1000 Genomes VCF

Description

This repository written in PySpark contains the process to process a VCF with the the Broad Institute's Hail platform. Hail is written on top of Apache Spark, and as of this writing, is on its second version, which allows the processing of multiple VCFs simultaneously.

Usage

Process 1000 Genomes VCF.

Prelim: set up hail context and Spark. download your vcf.

Step 1:

load vcf

Step 2:

split multiallelic variants

Step 3:

run VEP

Step 4:

explore data

Result: Your vcf file is loaded and annotated

Hail GWAS tutorial includes:

Loading data

Variant annotations

QC metrics

running the GWAS

PCA

Regression

Rare variant analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hail analysis of 1000 Genomes VCF

Description

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hail analysis of 1000 Genomes VCF

Description

Usage