Releases: google/deepvariant
DeepVariant 0.5.0
-
Release two separate models for calling genome and exome sequencing data. Significant improvement of Indel F1 on exome data.
- On exome sequencing data (HG002):
- Indel F1 0.936959 --> 0.961724; SNP F1 0.998636 --> 0.998962
- On whole genome sequencing data (HG002):
- Indel F1 0.996632 --> 0.996684; SNP F1 0.999495 --> 0.999542
- On exome sequencing data (HG002):
-
Provide capability to produce gVCF files as output from DeepVariant [doc]:
gVCF files are required as input for analyses that create a set of variants in a cohort of individuals, such as cohort merging or joint genotyping. -
Training data:
All models are trained with a benchmarking-compatible strategy: That is, we never train on any data from the HG002 sample, or from chromosome 20 from any sample.-
Whole genome sequencing model:
We used training data from both genome sequencing data as well as exome sequencing data.- WGS data:
- HG001: 1 from PrecisionFDA, and 8 replicates from Verily.
- HG005: 2 from Verily.
- WES data:
- HG001: 11 HiSeq2500, 17 HiSeq4000, 50 NovaSeq.
- HG005: 1 from Oslo University.
In order to increase diversity of training data, we also used the
downsample_fraction
flag when making training examples. - WGS data:
-
Whole exome sequencing model:
We started from a trained WGS model as a checkpoint, then we continue to train only on WES data above. We also use various downsample fractions for the training data.
-
-
DeepVariant now provides deterministic output by rounding QUAL field to one digit past the decimal when writing to VCF.
-
Update the model input data representation from 7 channels to 6.
- Removal of "Op-Len" (CIGAR operation length) as a model feature. In our tests this makes the model more robust to input that has different read lengths.
- Added an example for visualizing examples.
-
Add a post-processing step to variant calls to eliminate rare inconsistent haplotypes [description].
-
Expand the excluded contigs list to include common problematic contigs on GRCh38 [GitHub issue].
-
It is now possible to run DeepVariant workflows on GCP with pre-emptible GPUs.
DeepVariant 0.4.1
This fixes a problem with htslib_gcp_oauth when network access is unavailable.
DeepVariant 0.4.0
0.4.0
This is the initial open source release of DeepVariant!
It includes a model trained on 9 replicates of NA12878 / HG001 as well as copies each downsampled at 50% coverage. In our tests this additional training data means DeepVariant can generalize to a wider variety of input sequencing data. This produced approximately 100 million training examples. We use the truth set v.3.3.2 from Genome in a Bottle for training. The underlying model is Inception V3.
See historical release notes for more details.