Skip to content

Releases: google/deepvariant

DeepVariant 0.5.0

30 Jan 22:44
Compare
Choose a tag to compare
  1. Release two separate models for calling genome and exome sequencing data. Significant improvement of Indel F1 on exome data.

    • On exome sequencing data (HG002):
      • Indel F1 0.936959 --> 0.961724; SNP F1 0.998636 --> 0.998962
    • On whole genome sequencing data (HG002):
      • Indel F1 0.996632 --> 0.996684; SNP F1 0.999495 --> 0.999542
  2. Provide capability to produce gVCF files as output from DeepVariant [doc]:
    gVCF files are required as input for analyses that create a set of variants in a cohort of individuals, such as cohort merging or joint genotyping.

  3. Training data:
    All models are trained with a benchmarking-compatible strategy: That is, we never train on any data from the HG002 sample, or from chromosome 20 from any sample.

    • Whole genome sequencing model:
      We used training data from both genome sequencing data as well as exome sequencing data.

      • WGS data:
        • HG001: 1 from PrecisionFDA, and 8 replicates from Verily.
        • HG005: 2 from Verily.
      • WES data:
        • HG001: 11 HiSeq2500, 17 HiSeq4000, 50 NovaSeq.
        • HG005: 1 from Oslo University.

      In order to increase diversity of training data, we also used the downsample_fraction flag when making training examples.

    • Whole exome sequencing model:
      We started from a trained WGS model as a checkpoint, then we continue to train only on WES data above. We also use various downsample fractions for the training data.

  4. DeepVariant now provides deterministic output by rounding QUAL field to one digit past the decimal when writing to VCF.

  5. Update the model input data representation from 7 channels to 6.

    • Removal of "Op-Len" (CIGAR operation length) as a model feature. In our tests this makes the model more robust to input that has different read lengths.
    • Added an example for visualizing examples.
  6. Add a post-processing step to variant calls to eliminate rare inconsistent haplotypes [description].

  7. Expand the excluded contigs list to include common problematic contigs on GRCh38 [GitHub issue].

  8. It is now possible to run DeepVariant workflows on GCP with pre-emptible GPUs.

DeepVariant 0.4.1

13 Dec 00:32
Compare
Choose a tag to compare

This fixes a problem with htslib_gcp_oauth when network access is unavailable.

DeepVariant 0.4.0

04 Dec 17:52
Compare
Choose a tag to compare

0.4.0

This is the initial open source release of DeepVariant!

It includes a model trained on 9 replicates of NA12878 / HG001 as well as copies each downsampled at 50% coverage. In our tests this additional training data means DeepVariant can generalize to a wider variety of input sequencing data. This produced approximately 100 million training examples. We use the truth set v.3.3.2 from Genome in a Bottle for training. The underlying model is Inception V3.

See historical release notes for more details.