Nahrwhals Phasing

Phase a VCF into 1000 genomes SNPs by NYCG.

Description

The Snakefile takes as input a series of phased SNP VCFs (one per chromosome), and the NW output folders. It checks all aligned reads, assigns them a phase, and determines statistics for the most likely phase. It then carries out this re-phasing.

Final output, is a re-phased vcf in res/vcf.

Workflow

Given the different post-processing steps, running NWs in its entirety requires some steps:

(1) Running NW

Run the ntk-scan-snakemake pipeline which repeatedly calls Nahrwhals (found in another repo):

ssh seneca
cd /g/korbel/hoeps/projects/nahr/ntk_interesting_loci/ntk-scan-snakemake
## Edit configs 
conda activate nahrwhals-big

This will produce an all.tsv.

Convert all.tsv to all.vcf using:

R
nahrwhals::write_vcf('/path/to/all.tsv', '/path/to/all.vcf')

(2) Phasing with External Reference

If you also want to phase with respect to an external phasing, edit the config/Snake.config.json to indicate binaries for a lot of tools, and links to the reference fasta, NYGC VCF, and NAHRwhals output path and vcf. Files that have to be custom-made for this workflow are:

region_file: (can sometimes be recycled from "intervals.txt" from Nahrwhals)

chr1-103543499-103821114
chr1-108148088-108537579
chr1-109662781-109726958
...

sample_file:
```
GM19129
GM19434
HG00171
...
```

Ensure you have a working R environment loaded. If on Seneca, use:

```
conda deactivate # (a few times if necessary)
conda activate nahrwhals-big
```

Run the workflow with:

```
snakemake --cores n
```

Post-processing Steps

Merging BCF Files

After running the Snakefile, you need to post-process the results to merge the BCF files:

First, convert the NW_flipped_50000.vcf produced from the Snakemake part to a BCF file:

bcftools view -O b NW_flipped_50000.vcf -o NW_flipped_50000.bcf
tabix -p bcf NW_flipped_50000.bcf

Concatenate the individual allsample.vcfs produced from the Snakemake. (This is untested, maybe need some more flags or so)
```
bcftools concat /path/to/nygc_subsets/*_allsamples* -O z -o allregions_allsamples.vcf.gz
```

Run the BCF merging script with the generated BCF files:

./merge_bcfs.sh custom_header.txt NW_flipped_50000.bcf allregions_allsamples.vcf.gz merged_output.vcf.gz

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
config		config
data		data
plots_pca_scripts		plots_pca_scripts
workflow		workflow
.gitignore		.gitignore
README.md		README.md
dag_one_region.png		dag_one_region.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nahrwhals Phasing

Table of Contents

Description

Workflow

(1) Running NW

(2) Phasing with External Reference

Post-processing Steps

Merging BCF Files

About

Releases

Packages

Languages

WHops/nahrwhals_phasing

Folders and files

Latest commit

History

Repository files navigation

Nahrwhals Phasing

Table of Contents

Description

Workflow

(1) Running NW

(2) Phasing with External Reference

Post-processing Steps

Merging BCF Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages