Skip to content

Latest commit

 

History

History
42 lines (24 loc) · 1.48 KB

Data_cleaning_workflow.md

File metadata and controls

42 lines (24 loc) · 1.48 KB

Go to the main README

Data cleaning workflow

This is a brief description of our data cleaning process at the back-end. The intermediate dataset generated by this workflow will be the input files of our next step in the pipeline: LD pruning.

1. Obtaining GWAS data:

CAD GWAS

CAD - CARDIoGRAM plus C4d 1000G based GWAS (additive)

CARDIoGRAMplusC4D Consortium

Metabolite GWAS - urine

Raffler et al. 2015 study

Metabolite profiling was by NMR (Chenomx) (ChenomxID was mapped to KEGG id)

Metabolite GWAS - serum

Shin et al. 2014 study

Metabolite profiling was by MS (Metabolon)

Metabolites: Serum & Urine

2. Merge shin et. al association output files

 ./merge_gwas_assoc.sh

3. Filter the data with a P-value threshold

Currently the P-value threshold was set to 10^-5.

 ./filter_data_pval.pl <association_file> <Pvalue_cutoff> <cad | serum | urine>

The output file from this step will be formatted to be suitable for the LD pruning process.

4.(optional) Obtain a subset of SNPs that present in both the CAD and the metabolite datasets

 ./get_intersect_snps.pl <trait_assoc_file> <metab_assoc_file> <serum | urine>