Hail analysis of 1000 Genomes VCF

Description

This repository written in PySpark contains the process to process a VCF with the the Broad Institute's Hail platform. Hail is written on top of Apache Spark, and as of this writing, is on its second version, which allows the processing of multiple VCFs simultaneously.

Usage

Process 1000 Genomes VCF.

Prelim: set up hail context and Spark. download your vcf.

Step 1:

load vcf

Step 2:

split multiallelic variants

Step 3:

run VEP

Step 4:

explore data

Result: Your vcf file is loaded and annotated

Hail GWAS tutorial includes:

Loading data

Variant annotations

QC metrics

running the GWAS

PCA

Regression

Rare variant analysis

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
1000genomes		1000genomes
Hail_GWAS_tutorial.ipynb		Hail_GWAS_tutorial.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hail analysis of 1000 Genomes VCF

Description

Usage

About

Releases

Packages

Languages

jsacco1/Hail

Folders and files

Latest commit

History

Repository files navigation

Hail analysis of 1000 Genomes VCF

Description

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages