Skip to content

VS-1178 merge master into ah var store again #9135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 174 commits into
base: ah_var_store
Choose a base branch
from

Conversation

gbggrant
Copy link
Collaborator

@gbggrant gbggrant commented Apr 1, 2025

Rerunning Chr20/X/Y Integration Test here
Passing All Chromosomes Integration Test here

jamesemery and others added 30 commits October 11, 2023 10:01
* hmer ondel must have mon length

* Revert "hmer ondel must have mon length"

This reverts commit 7852871.

* remove superfluous variant type condition

* fix error message to actually reflect missing argument

* fixed unittest to include variant type

* Remove conflict
* Additional fix + logging fixes
* Added missing initialization
)

* Add option for keeping disjoint mates in ASM

* Better name and fixing reports

* Finish fixing report

* Fix report name
New Tool: GroundTruthScorer
Update: FlowFeatureMapper
…dependency on the ADAM library (#8606)

* Add a native GATK implementation for 2bit references, with comprehensive unit tests

* For now, this is only hooked up to the Spark codepath, but it could easily be hooked up to ReferenceDataSource and the Walker codepath as well

* Remove the dependency on the ADAM library, to resolve conflicts with future dependency upgrades
…curity scanner to build.gradle (#8607)

* Updated many GATK dependencies to address known security vulnerabilities

* Added a security scanner to build.gradle

* There are still some remaining vulnerabilities in GATK dependencies, but this eliminates most of them
* Update http-nio and wire it so it's configured at startup along with GCS setttings.
* New experimental tool to print out human readable file diagnostics for cram/crai/bai files.
…#8438)

* GATK's lack of support for az:// URIs means that although GenomicsDB can
  natively read them, parts of the java code crash when interacting with them
* Adding --avoid-nio and --header arguments
  These allow disabling all of the java interaction with the az:// links
  and simply passing  them through to genomicsdb
  This disables some safeguards but allows operating on files in azur
* Update GenomicsDB version to 1.5.1 for azure improved support

* There are no direct tests on azure since we do not yet have any infrastructure
  to generate the necessary tokens, there is a disabled test which requires
  #8612 before we can enable it.

---------

Co-authored-by: Nalini Ganapati <[email protected]>
Co-authored-by: Nalini Ganapati <[email protected]>
For having variable ploidy in different regions, like making haploid calls outside the PAR on chrX or chrY, 
there is now a --ploidy-regions flag. The -ploidy flag sets the default ploidy to use everywhere, and --ploidy-regions
should be a .bed or .interval_list with "name" column containing the desired ploidy to use in that region
when genotyping. Note that variants near the boundary may not have the matching ploidy since the ploidy used will be determined using the following precedence:

* ploidy given in --ploidy-regions for all intervals overlapping the active region when calling your variant
  with ties broken by using largest ploidy); note ploidy interval may only overlap the active region and determine 
  the ploidy of your variant even if the end coordinate written for your variant lies outside the given region
* ploidy given via global -ploidy flag
* ploidy determined by the default global built-in constant for humans (2).

---------

Co-authored-by: Ty Kay <[email protected]>
Co-authored-by: rickymagner <[email protected]>
* Update the GATK base image to the latest Ubuntu LTS release (22.04)

* Add some additional useful utilities to the base image

* Switch to a newer conda version with a much faster solver

* Update the scripts and documentation for building the base image

* Update the VETS integration tests to allow for a small epsilon during numeric comparisons, and include the full diff output in exception messages when a mismatch is detected
…oud-based docker build, and add a release script (#8247)

* Added a -r argument to build_docker_remote.sh to toggle the RELEASE flag during
  docker builds

* Added a release_prebuilt_docker_image.sh to release a prebuilt docker image to the
  official repos
* update to htsjdk 4.1.0 which enables http-nio in more cases
* remove several test cases handling genomicsdb path parsing which were testing nonsensical paths that are now illegal
* This should make http access seamless in many places

* The way this handles query parameters is not ideal for signed url cases so we'll need to revisit that
…ervals output (#8621)

* Write gCNV interval output ID=GT header as Type=String

Incorrectly writing this as Type=Integer causes bcftools to misparse
the genotype field.

* Use correct header types and numbers in test VCF file
* include normal seq error log likelihood in Permutect dataset

* handle different alelle representations in multiallelic / indel variants for Permutect training data mode

* set the default artifact to non-artifact ratio to 1 in Permutect training data mode
@gbggrant gbggrant marked this pull request as ready for review April 20, 2025 11:04
@mcovarr
Copy link
Collaborator

mcovarr commented Apr 22, 2025

There seems to be a new high security vulnerability error on this branch that doesn't exist in master or ah_var_store. The only difference between this branch and master in the file in question is that this branch removed two unused imports; I pushed a branch of master that removed those imports and didn't see the vulnerability error. I'm trying a fresh merge of master onto this branch to see if recent build.gradle changes clear the error.

@@ -295,7 +291,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
wdlTest: [ 'RUN_CNV_GERMLINE_COHORT_WDL', 'RUN_CNV_GERMLINE_CASE_WDL', 'RUN_CNV_SOMATIC_WDL', 'RUN_M2_WDL', 'RUN_CNN_WDL', 'RUN_VCF_SITE_LEVEL_FILTERING_WDL' ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's how the CNN WDL failures were fixed on master 😄

#### Build the gatkbase docker image remotely using Google Cloud Build:

```bash
build_docker_base_cloud.sh <docker_image_version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy!

@@ -264,7 +264,7 @@ def convert_array_with_id_keys_to_dense_array(arr, ids, drop=[]):
var_ht = hl.import_avro(var_group)
var_ht = var_ht.transmute(locus=translate_locus(var_ht.location),
local_alleles=hl.array([var_ht.ref]).extend(var_ht.alt.split(',')),
LGT=hl.parse_call(var_ht.GT),
LGT=hl.parse_call(hl.or_missing((hl.is_missing(var_ht.GQ) | (var_ht.GQ != 0)), var_ht.GT)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a ticket made to address backpatching the Echo VDS GQ 0 genotypes in merge_and_rescore.py? I went looking for such a ticket but didn't find one...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.