Data cleaning workflow

This is a brief description of our data cleaning process at the back-end. The intermediate dataset generated by this workflow will be the input files of our next step in the pipeline: LD pruning.

1. Obtaining GWAS data:

CAD GWAS

CAD - CARDIoGRAM plus C4d 1000G based GWAS (additive)

CARDIoGRAMplusC4D Consortium

Metabolite GWAS - urine

Raffler et al. 2015 study

Metabolite profiling was by NMR (Chenomx) (ChenomxID was mapped to KEGG id)

Metabolite GWAS - serum

Shin et al. 2014 study

Metabolite profiling was by MS (Metabolon)

Metabolites: Serum & Urine

2. Merge shin et. al association output files

 ./merge_gwas_assoc.sh

3. Filter the data with a P-value threshold

Currently the P-value threshold was set to 10^-5.

 ./filter_data_pval.pl <association_file> <Pvalue_cutoff> <cad | serum | urine>

The output file from this step will be formatted to be suitable for the LD pruning process.

4.(optional) Obtain a subset of SNPs that present in both the CAD and the metabolite datasets

 ./get_intersect_snps.pl <trait_assoc_file> <metab_assoc_file> <serum | urine>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data_cleaning_workflow.md

Data_cleaning_workflow.md

Data cleaning workflow

1. Obtaining GWAS data:

CAD GWAS

Metabolite GWAS - urine

Metabolite GWAS - serum

Shin et al. 2014 study

Metabolite profiling was by MS (Metabolon)

Metabolites: Serum & Urine

2. Merge shin et. al association output files

3. Filter the data with a P-value threshold

4.(optional) Obtain a subset of SNPs that present in both the CAD and the metabolite datasets

Files

Data_cleaning_workflow.md

Latest commit

History

Data_cleaning_workflow.md

File metadata and controls

Data cleaning workflow

1. Obtaining GWAS data:

CAD GWAS

Metabolite GWAS - urine

Metabolite GWAS - serum

Shin et al. 2014 study Metabolite profiling was by MS (Metabolon) Metabolites: Serum & Urine

2. Merge shin et. al association output files

3. Filter the data with a P-value threshold

4.(optional) Obtain a subset of SNPs that present in both the CAD and the metabolite datasets

Shin et al. 2014 study

Metabolite profiling was by MS (Metabolon)

Metabolites: Serum & Urine