Skip to content

Releases: broadinstitute/viral-pipelines

v2.1.32.4

02 Oct 00:54
65eb6fc
Compare
Choose a tag to compare

new features:

  • nextclade_version output string now includes nextclade datasets "tag" (version/date) [#371]
  • implement nextclade_multi_sample and pangolin_multi_sample with Map task outputs, switch sarscov2_batch_relineage and sarscov2_illumina_full to use multi_sample pangolin and nextclade tasks to increase compute efficiency and reduce shard counts [#368]

bug fixes:

  • rename detect_cross_contamination task wdl to be distinct from workflow name to fix dxWDL builds [#370]

vm/image updates:

  • update pangolin 3.1.11 to 3.1.14, update pangolearn 2021-09-17 to 2021-09-28, update nextclade 1.2.3 to 1.4.0 [#371]

v2.1.32.3

28 Sep 02:21
f5c0a0d
Compare
Choose a tag to compare

improvements:

  • sarscov2_biosample_load workflow: stop using today's date in constructing ftp directory path for NCBI BioSample submissions in order to allow call caching for jobs run on different days [#366]

bug fixes:

  • fix for Array[Array[String]] alerts output variable from vadr task (update to new vadr output format) [#363]
  • edge case bug fix for nextstrain subsampling keep_list (was always mangling the first entry of a user-specified keep list) [#364]

Broad-specific:

  • add more external lab names to task crsp_meta_etl [#367]

minor VM/docker changes:

  • VM shape changes in nextstrain pipeline [#362]
  • pangolearn image update [#365]

v2.1.32.2

05 Sep 21:52
aad632b
Compare
Choose a tag to compare
  • bugfix: Broad dashboard output file should be txt not tsv [#360]
  • docker update to pangolearn 2021-08-24 [#359]
  • docker update to sc2-rmd:0.1.25 [#361]

v2.1.32.1

29 Aug 00:40
ff86ba2
Compare
Choose a tag to compare

bugfixes:

  • workflow sarscov2_nextstrain and sarscov2_nextstrain_aligned_input: bug fix to DAG -- ensure that treetime and ancestral inference are using masked alignments, not unmasked alignments [#358]
  • task crsp_meta_etl: add more possible values to the controlled vocabulary options for body_part [#357]

v2.1.32.0

18 Aug 23:07
550a71e
Compare
Choose a tag to compare

new features:

  • most task runtime blocks now support cromwell auto memory scaling/retry
  • automated data release and delivery sarscov2_data_release
  • batch recalling of pango/nextclade lineages sarscov2_batch_relineage
  • improved automated BioSample registration and metadata handling from Broad CRSP samples and external non-Broad samples via GP pipeline
  • add sarscov2_biosample_load as optional subworkflow call at the beginning of sarscov2_illumina_full for fully automated use by Terra workflow launcher
  • updated/improved Picard-based illumina demux
  • move state public health reporting from sarscov2_illumina_full to sarscov2_data_release

bug fixes:

minor updates to docker images and vm shapes:

  • pangolin 3.1.11 / pangolearn 2021-08-09
  • nextclade 1.2.3
  • vadr 1.3
  • nextstrain 20210413T201712Z
  • sc2-rmd, viral-core

build changes:

  • GitHub Actions CI added, now primary. Travis CI still active at the moment

v2.1.28.0

01 May 03:40
e1b71c2
Compare
Choose a tag to compare

new features:

  • new workflow sarscov2_nextstrain [#204, #208, #219]
  • updates to genbank submission [#201, #207]
  • update vadr alert criteria based on NCBI recommendations [#234, #249]
  • add nextclade tree outputs to sarscov2_illumina_full [#233]
  • add sequencing reports via rmarkdown (sarscov2_illumina_full) [#222, #226, #228, #235, #236, #244, #245, #248, #265]
  • ivar trim updates: emit ivar trim stats (assemble_refbased) and compute summary stats (sarscov2_illumina_full) [#237]
  • terra table upload and download [#206, #241]
  • add picard wgs metrics, alignment metrics, and insert size metrics to assemble_refbased and sarscov2_illumina_full [#239, #282]
  • add bucket delivery of data for CDC, SRA, and GP reporting to sarscov2_illumina_full [#258, #263, #278]
  • add tasks and workflows for NCBI BioSample registration and metadata retrieval [#279]
  • automated filtering of libraries from failed NTC controls [#266]

bug fixes:

  • bugfix whitespace handling in gzcat task [#230]
  • deduplicate output rows from sra_meta_prep [#220]
  • GISAID metadata output should be CSV not TSV [#273]
  • derive Illumina run ID from XML instead of tarball filename [#275]

minor updates to docker images and vm shapes:

build changes:

  • bump cromwell and womtool 54 to 61 [#272]
  • temporarily drop dnanexus builds from Travis until we clean up the dnanexus CI project [#280, #283]

v2.1.19.0

26 Jan 14:58
ff81708
Compare
Choose a tag to compare

Added new workflow: sarscov2_sra_to_genbank -- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200]

Minor changes and fixes to sarscov2_illumina_full:

  • filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200]
  • relax minimum genome unambig bp cutoff from 20kb to 15kb [#200]
  • allow for merging multiple biosample attributes tsvs together in sarscov2_illumina_full [#200]
  • add "Sequencing Technology" column to both genbank and gisaid submission packages [#200]
  • greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200]
  • makes filename outputs a bit more organized [#200]
  • exposes cleaned_bam_uris text file output for easy SRA submission [#200]
  • replace the first several steps with an invocation of demux_deplete as a subworkflow to reduce code duplication [#197]

Other minor changes:

  • sarscov2_lineages and sarscov2_illumina_full: rename output variable pangolin_clade to pango_lineage to stay in line with the nomenclature of the PANGOLIN authors. [#197]
  • increase default RAM for GATK UG consensus calling in assemble_refbased from 7GB to 15GB. [#200]
  • bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments.
  • bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199]

v2.1.18.0

17 Jan 00:59
799e054
Compare
Choose a tag to compare

New general workflows:

  • new workflow demux_deplete. This sits between demux_only (demux and fastqc) and demux_plus (which adds kraken, spades, etc) and just does demux, fastqc, and depletion. If optionally given "augmented" samplesheets and NCBI BioSample mappings, it will produce SRA submission bundles as well. [#191]
  • new workflow mafft_and_snp_annotated, which adds snpEff annotation to the snp-sites output [#194]

New SARS-CoV-2 specific workflows. Up until this release, all included workflows were generally applicable to most viral taxa. This release includes a number of single-taxon workflows exclusively for SARS-CoV-2 in order to increase efficiency for high throughput work on this one virus.

  • new workflow sarscov2_illumina_full is a full end-to-end workflow from Illumina BCL tarball through Genbank, SRA, and GISAID submission bundles. It wraps together demux_deplete, assemble_refbased, sarscov2_lineages, sarscov2_genbank. It requires the user to pre-register NCBI BioSample entries and to provide an "augmented" samplesheet for demux. [#191, #196]

  • new workflow sarscov2_genbank. Prepares single-segmented genome assemblies for submission to NCBI Genbank using their new SARS-CoV-2 submission mechanism (which may become more mainstream for other viruses as well). Incorporates the new VADR (Viral Annotation DefineR) tool from NCBI to annotate (produce tbl files) and QC (flag frameshift and other problems) using the same settings that Genbank uses for QC -- this filters out genomes from submission that fail VADR QC and should result in Genbank submissions with no rejections. [#191]

  • new workflow sarscov2_lineages and sarscov2_nextclade_multi. Runs Nextclade and Pangolin to do lineage/clade classification on SARS-CoV-2 genomes. [#184, #185, #186]

  • nextstrain/augur workflow improvements and bugfixes to allow for merging of multiple metadata tsv files. This simplifies the process of regular builds where some data is changing frequently [#189, #181, #191]

  • docker image updates: [#195, #190, #187, #182, #191, #193]

  • VM shape updates [#188]

  • README update with diagram [#183, @llangit-broad]

v2.1.12.0

14 Dec 22:24
87066a4
Compare
Choose a tag to compare

Fixes:

  • remove recursion limit for finding the RunInfo.xml file when unpacking sequencing run tarballs in illumina_demux [#179]

Updated:

  • bump viral docker base layers to images based on viral-core 2.1.12 [#180]

v2.1.10.0

04 Dec 16:53
ba04a93
Compare
Choose a tag to compare

New or changed WDL workflows:

  • new workflow: subsample_by_metadata_with_focal [#161]

Changes:

  • rename workflow augur_from_newick to augur_export_only, add new workflow augur_from_mltree [#151]
  • drop support for trinity (pinned version) assembler [#168]
  • bump upstream docker images [#169, #163]
  • README changes [#162]

Fixes:

  • bugfix: isnvs_per_sample when specifying optional parameters [#160]
  • bugfixes for use of set -o pipefail [#175, #173]
  • bugfix: optional input handling in filter_bam_to_taxa [#171]

VM shape parameterization or changes to defaults:

  • mafft parameterization [#170]
  • memory increase to task filter_sequences_to_list [#159]
  • parameterize BEAST GPU settings [#158]
  • memory increase to assemble_refbased (specifically to task align_reads) [#157]
  • memory increase to task multi_align_mafft [#155]
  • memory increase & cpu decrease to task refine_augur_tree and ancestral_tree (the timetree invocations) [#153]
  • parameterize CPU count for task draft_augur_tree (iqtree) [#150]

DNAnexus:

  • update DNAnexus demux_launcher to v2 instance types [#156]
  • update instance types to v2 [#167]