GATK Pipeline Validation Testing #470

alex-hancock · 2016-10-11T18:31:13Z

@fnothaft has concerns (upon which he will elaborate) regarding the testing process.

jpfeil · 2016-10-17T18:01:42Z

The current plan is to run the Ashkenazim trio (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/) on an AWS cluster. This consists of three 2x250 Illumina samples. GIAB also provides high confidence variant calls for each sample for benchmarking. We plan to genotype each sample individually and filter variants using hard filters. The goal is to test a configuration that is similar to the one the ADAM/GATK comparison will use. It was brought up that testing three samples may not be sufficient. We can of course find more samples to run, but it will obviously cost more to run more samples. How many samples are you planning to run for the ADAM/GATK comparison?

fnothaft · 2016-10-17T18:21:02Z

10 for the head to head, 260 for ADAM only. I would probably go with a larger dataset than a trio; the Illumina Platinum Pedigree (http://www.illumina.com/platinumgenomes/) is something I would run through. Essentially, I'd like to push at least 1TB of data through the pipeline.

jpfeil · 2016-10-17T18:37:35Z

Okay! That sounds good. Alex and I will kick off a run with the Platinum Pedigree samples ASAP.

alex-hancock assigned jpfeil Oct 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GATK Pipeline Validation Testing #470

GATK Pipeline Validation Testing #470

alex-hancock commented Oct 11, 2016

jpfeil commented Oct 17, 2016

fnothaft commented Oct 17, 2016

jpfeil commented Oct 17, 2016

GATK Pipeline Validation Testing #470

GATK Pipeline Validation Testing #470

Comments

alex-hancock commented Oct 11, 2016

jpfeil commented Oct 17, 2016

fnothaft commented Oct 17, 2016

jpfeil commented Oct 17, 2016