-
Notifications
You must be signed in to change notification settings - Fork 179
Description
In light of #756 and further discussion with implementations, it appears that the current definition of GQ is problematic.
GQ (Integer): Conditional genotype quality, encoded as a phred quality −10log10 p(genotype call is wrong, conditioned on the site’s being variant)
The conditioned on the site’s being variant is problematic as, in practice, the unconditioned genotype quality is a much more meaningful value. If 0/0 is unlikely then it's unlikely to change the result by a full integer value, and if 0/0 is not unlikely, then by conditioning on the site being variant (vague language is which also problematic) then the GQ is discarding information that is actually important to variant interpretation. Furthermore, if the site is actually 0/0 then GQ is meaningless as p(!A | !A) is always one.
The question I have is, if we were to make this change, what's the best way to go about it? I see three approaches:
- Implement in Vnext?
- Errata into the latest version (4.5)?
- Errata into all VCF version?
The most widely used variant callers do not follow the currently specifications (DRAGEN ignores the conditional, GATK compares against the next most likely genotype), but there may be variants callers out there than do and I don't want to penalise them for following the specs.