Skip to content

Redefine GQ  #797

@d-cameron

Description

@d-cameron

In light of #756 and further discussion with implementations, it appears that the current definition of GQ is problematic.

GQ (Integer): Conditional genotype quality, encoded as a phred quality −10log10 p(genotype call is wrong, conditioned on the site’s being variant)

The conditioned on the site’s being variant is problematic as, in practice, the unconditioned genotype quality is a much more meaningful value. If 0/0 is unlikely then it's unlikely to change the result by a full integer value, and if 0/0 is not unlikely, then by conditioning on the site being variant (vague language is which also problematic) then the GQ is discarding information that is actually important to variant interpretation. Furthermore, if the site is actually 0/0 then GQ is meaningless as p(!A | !A) is always one.

The question I have is, if we were to make this change, what's the best way to go about it? I see three approaches:

  • Implement in Vnext?
  • Errata into the latest version (4.5)?
  • Errata into all VCF version?

The most widely used variant callers do not follow the currently specifications (DRAGEN ignores the conditional, GATK compares against the next most likely genotype), but there may be variants callers out there than do and I don't want to penalise them for following the specs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions