Release v1.10.0-beta · google/deepvariant

This beta release focuses solely on DeepVariant, with no updates for pangenome-aware DeepVariant or DeepTrio. We encourage users to provide feedback, report bugs, and offer suggestions to help us improve.

Code is available on the r1.10.0-beta branch.
Docker: google/deepvariant:1.10.0-beta
Docker (GPU): google/deepvariant:1.10.0-beta-gpu
We have updated the metrics page with the latest accuracy / runtime results.

Key updates are detailed below.

Continuous Phasing

It is now possible for DeepVariant to natively emit a phased VCF for long reads (PacBio and ONT), leveraging the long-range information from these reads to accurately phase variants and assign a haplotype.

To enable this feature, you must set the following flags when running with run_deepvariant:

--make_examples_extra_args="phase_reads=true,output_phase_info=true,output_local_read_phasing=/tmp/read-phasing_debug@${N_SHARDS}.tsv" \
--postprocess_variants_extra_args="phased_reads_input_path=/tmp/read-phasing_debug@${N_SHARDS}.tsv"

Make sure that N_SHARDS matches the sharding set globally.

model.example_info.json

Models can now be packaged with an extra file called model.example_info.json which carries the flags needed to generate examples (model inputs) when running inference. Here is an example of what this looks like:

{
  "version": "1.10.0-beta",
  "shape": [100, 147, 10],
  "channels": [1, 2, 3, 4, 5, 6, 7, 26, 9, 10],
  "flags_for_calling": {
    "alt_aligned_pileup": "diff_channels",
    "call_small_model_examples": true,
    "keep_supplementary_alignments": true,
    "max_reads_per_partition": 600,
    "min_mapping_quality": 1,
    "parse_sam_aux_fields": true,
    "partition_size": 25000,
    "phase_reads": true,
    "pileup_image_height": 100,
    "pileup_image_width": 147,
    "realign_reads": false,
    "small_model_indel_gq_threshold": 16,
    "small_model_snp_gq_threshold": 15,
    "small_model_vaf_context_window_size": 51,
    "sort_by_haplotypes": true,
    "track_ref_reads": true,
    "trained_small_model_path": "/opt/smallmodels/pacbio",
    "trim_reads_for_pileup": true,
    "vsc_min_fraction_indels": 0.12
  }

}

The flags used to generate examples are specific to each model, and it is important that they are set correctly for a given model to match the characteristics the model was trained on.

How is model.example_info.json useful?

DeepVariant can be run in two ways. The first way is to use the run_deepvariant command, which automatically sets options and runs each stage of DeepVariant.

The second way is to run these stages (make_examples, call_variants, and postprocess_variants) individually. This method can be significantly faster and more efficient because make_examples and call_variants can be parallelized - even across multiple machines. However, previously, this approach required that the flags for make_examples be set manually, which makes constructing more efficient pipelines tricky. With this change, users can provide the make_examples stage with the --checkpoint flag, and the model_example_info.json flag will be read in and used to set the flags appropriate for the given model.

Using model.example_info.json:

Here is an example illustrating how you could make use this setup:

make_examples \
  --mode calling \
  --ref hg38.fa \
  --reads pacbio_input.bam \
  --examples "[email protected]" \
  --checkpoint "/opt/models/pacbio" \
  --task=1

The logs should report the flags that are then set using model.example_info.json:

[make_examples_core.py:3794] Flags for calling:
alt_aligned_pileup: diff_channels
call_small_model_examples: True
keep_supplementary_alignments: True
…

Docker Images are Streamlined

Docker images have been simplified to have fewer layers and to remove unnecessary files / layers. The table below illustrates the difference in terms of disk size and the number of layers.

Version	Size	Number of Layers
1.9	6.1GB	114
1.10.0-beta	4.8GB	23

This reduces the size by ~21% and the number of layers by ~80%.

Additional Updates

This list is not exhaustive, and smaller bug fixes and improvements may not be listed here.

The ONT model now uses a new input channel, READ_SUPPORTS_VARIANT_FUZZY that indicates support for a variant based on a fuzzy, rather than exact match.
The ONT model now sets alt_aligned_pileup=’rows’, meaning that alternative alignments are encoded using additional pileup rows in the model input, rather than additional channels.
The PacBio now uses the --keep_supplementary_alignments flag which leads to a slight improvement in accuracy.
Tensorflow updated from 2.13.1 to 2.16.1.
CUDA has been updated from 11.8 to 12.3 and cuDNN has been updated from 8.6.0 to 8.9.0 in our GPU docker image.
Use std::stable_sort instead of std::sort for pileup image rows. This leads to consistent pileup image generation.

Note: Some outputs (e.g. VCF) may still report v1.9 in the header as we did not update all version references.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.10.0-beta

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Continuous Phasing

model.example_info.json

Docker Images are Streamlined

Additional Updates

Uh oh!