This beta release focuses solely on DeepVariant, with no updates for pangenome-aware DeepVariant or DeepTrio. We encourage users to provide feedback, report bugs, and offer suggestions to help us improve.
- Code is available on the r1.10.0-beta branch.
- Docker:
google/deepvariant:1.10.0-beta - Docker (GPU):
google/deepvariant:1.10.0-beta-gpu - We have updated the metrics page with the latest accuracy / runtime results.
Key updates are detailed below.
Continuous Phasing
It is now possible for DeepVariant to natively emit a phased VCF for long reads (PacBio and ONT), leveraging the long-range information from these reads to accurately phase variants and assign a haplotype.
To enable this feature, you must set the following flags when running with run_deepvariant:
--make_examples_extra_args="phase_reads=true,output_phase_info=true,output_local_read_phasing=/tmp/read-phasing_debug@${N_SHARDS}.tsv" \
--postprocess_variants_extra_args="phased_reads_input_path=/tmp/read-phasing_debug@${N_SHARDS}.tsv"
Make sure that N_SHARDS matches the sharding set globally.
model.example_info.json
Models can now be packaged with an extra file called model.example_info.json which carries the flags needed to generate examples (model inputs) when running inference. Here is an example of what this looks like:
{
"version": "1.10.0-beta",
"shape": [100, 147, 10],
"channels": [1, 2, 3, 4, 5, 6, 7, 26, 9, 10],
"flags_for_calling": {
"alt_aligned_pileup": "diff_channels",
"call_small_model_examples": true,
"keep_supplementary_alignments": true,
"max_reads_per_partition": 600,
"min_mapping_quality": 1,
"parse_sam_aux_fields": true,
"partition_size": 25000,
"phase_reads": true,
"pileup_image_height": 100,
"pileup_image_width": 147,
"realign_reads": false,
"small_model_indel_gq_threshold": 16,
"small_model_snp_gq_threshold": 15,
"small_model_vaf_context_window_size": 51,
"sort_by_haplotypes": true,
"track_ref_reads": true,
"trained_small_model_path": "/opt/smallmodels/pacbio",
"trim_reads_for_pileup": true,
"vsc_min_fraction_indels": 0.12
}
}
The flags used to generate examples are specific to each model, and it is important that they are set correctly for a given model to match the characteristics the model was trained on.
How is model.example_info.json useful?
DeepVariant can be run in two ways. The first way is to use the run_deepvariant command, which automatically sets options and runs each stage of DeepVariant.
The second way is to run these stages (make_examples, call_variants, and postprocess_variants) individually. This method can be significantly faster and more efficient because make_examples and call_variants can be parallelized - even across multiple machines. However, previously, this approach required that the flags for make_examples be set manually, which makes constructing more efficient pipelines tricky. With this change, users can provide the make_examples stage with the --checkpoint flag, and the model_example_info.json flag will be read in and used to set the flags appropriate for the given model.
Using model.example_info.json:
Here is an example illustrating how you could make use this setup:
make_examples \
--mode calling \
--ref hg38.fa \
--reads pacbio_input.bam \
--examples "[email protected]" \
--checkpoint "/opt/models/pacbio" \
--task=1The logs should report the flags that are then set using model.example_info.json:
[make_examples_core.py:3794] Flags for calling:
alt_aligned_pileup: diff_channels
call_small_model_examples: True
keep_supplementary_alignments: True
…
Docker Images are Streamlined
Docker images have been simplified to have fewer layers and to remove unnecessary files / layers. The table below illustrates the difference in terms of disk size and the number of layers.
| Version | Size | Number of Layers |
|---|---|---|
| 1.9 | 6.1GB | 114 |
| 1.10.0-beta | 4.8GB | 23 |
This reduces the size by ~21% and the number of layers by ~80%.
Additional Updates
This list is not exhaustive, and smaller bug fixes and improvements may not be listed here.
- The ONT model now uses a new input channel,
READ_SUPPORTS_VARIANT_FUZZYthat indicates support for a variant based on a fuzzy, rather than exact match. - The ONT model now sets
alt_aligned_pileup=’rows’, meaning that alternative alignments are encoded using additional pileup rows in the model input, rather than additional channels. - The PacBio now uses the
--keep_supplementary_alignmentsflag which leads to a slight improvement in accuracy. - Tensorflow updated from
2.13.1to2.16.1. - CUDA has been updated from
11.8to12.3and cuDNN has been updated from8.6.0to8.9.0in our GPU docker image. - Use
std::stable_sortinstead ofstd::sortfor pileup image rows. This leads to consistent pileup image generation.
Note: Some outputs (e.g. VCF) may still report v1.9 in the header as we did not update all version references.