Releases: broadinstitute/viral-pipelines
v2.1.32.4
new features:
- nextclade_version output string now includes nextclade datasets "tag" (version/date) [#371]
- implement nextclade_multi_sample and pangolin_multi_sample with Map task outputs, switch sarscov2_batch_relineage and sarscov2_illumina_full to use multi_sample pangolin and nextclade tasks to increase compute efficiency and reduce shard counts [#368]
bug fixes:
- rename detect_cross_contamination task wdl to be distinct from workflow name to fix dxWDL builds [#370]
vm/image updates:
- update pangolin 3.1.11 to 3.1.14, update pangolearn 2021-09-17 to 2021-09-28, update nextclade 1.2.3 to 1.4.0 [#371]
v2.1.32.3
improvements:
sarscov2_biosample_load
workflow: stop using today's date in constructing ftp directory path for NCBI BioSample submissions in order to allow call caching for jobs run on different days [#366]
bug fixes:
- fix for
Array[Array[String]] alerts
output variable from vadr task (update to new vadr output format) [#363] - edge case bug fix for nextstrain subsampling keep_list (was always mangling the first entry of a user-specified keep list) [#364]
Broad-specific:
- add more external lab names to task
crsp_meta_etl
[#367]
minor VM/docker changes:
v2.1.32.2
v2.1.32.1
bugfixes:
v2.1.32.0
new features:
- most task runtime blocks now support cromwell auto memory scaling/retry
- automated data release and delivery
sarscov2_data_release
- batch recalling of pango/nextclade lineages
sarscov2_batch_relineage
- improved automated BioSample registration and metadata handling from Broad CRSP samples and external non-Broad samples via GP pipeline
- add
sarscov2_biosample_load
as optional subworkflow call at the beginning ofsarscov2_illumina_full
for fully automated use by Terra workflow launcher - updated/improved Picard-based illumina demux
- move state public health reporting from
sarscov2_illumina_full
tosarscov2_data_release
bug fixes:
minor updates to docker images and vm shapes:
- pangolin 3.1.11 / pangolearn 2021-08-09
- nextclade 1.2.3
- vadr 1.3
- nextstrain 20210413T201712Z
- sc2-rmd, viral-core
build changes:
- GitHub Actions CI added, now primary. Travis CI still active at the moment
v2.1.28.0
new features:
- new workflow sarscov2_nextstrain [#204, #208, #219]
- updates to genbank submission [#201, #207]
- update vadr alert criteria based on NCBI recommendations [#234, #249]
- add nextclade tree outputs to sarscov2_illumina_full [#233]
- add sequencing reports via rmarkdown (sarscov2_illumina_full) [#222, #226, #228, #235, #236, #244, #245, #248, #265]
- ivar trim updates: emit ivar trim stats (assemble_refbased) and compute summary stats (sarscov2_illumina_full) [#237]
- terra table upload and download [#206, #241]
- add picard wgs metrics, alignment metrics, and insert size metrics to assemble_refbased and sarscov2_illumina_full [#239, #282]
- add bucket delivery of data for CDC, SRA, and GP reporting to sarscov2_illumina_full [#258, #263, #278]
- add tasks and workflows for NCBI BioSample registration and metadata retrieval [#279]
- automated filtering of libraries from failed NTC controls [#266]
bug fixes:
- bugfix whitespace handling in gzcat task [#230]
- deduplicate output rows from sra_meta_prep [#220]
- GISAID metadata output should be CSV not TSV [#273]
- derive Illumina run ID from XML instead of tarball filename [#275]
minor updates to docker images and vm shapes:
- vm shape updates on augur steps [#205, #224, #225, #229, #232]
- bump viral-core docker [#242, #243, #268, #270, #274, #276, #281, #284]
- bump ivar docker [#209]
- bump pangolin/pangolearn [#203, #205, #210, #213, #214, #215, #216, #217, #218, #240, #250, #254, #267, #271, #285]
- bump vadr docker [#216, #264]
- bump nextstrain/base [#238]
- bump sc2-rmd docker [#269, #277]
- update nextmeta tsv output behavior to match new nextmeta spellings [#231]
build changes:
v2.1.19.0
Added new workflow: sarscov2_sra_to_genbank
-- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200]
Minor changes and fixes to sarscov2_illumina_full
:
- filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200]
- relax minimum genome unambig bp cutoff from 20kb to 15kb [#200]
- allow for merging multiple biosample attributes tsvs together in
sarscov2_illumina_full
[#200] - add "Sequencing Technology" column to both genbank and gisaid submission packages [#200]
- greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200]
- makes filename outputs a bit more organized [#200]
- exposes cleaned_bam_uris text file output for easy SRA submission [#200]
- replace the first several steps with an invocation of
demux_deplete
as a subworkflow to reduce code duplication [#197]
Other minor changes:
sarscov2_lineages
andsarscov2_illumina_full
: rename output variablepangolin_clade
topango_lineage
to stay in line with the nomenclature of the PANGOLIN authors. [#197]- increase default RAM for GATK UG consensus calling in
assemble_refbased
from 7GB to 15GB. [#200] - bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments.
- bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199]
v2.1.18.0
New general workflows:
- new workflow
demux_deplete
. This sits betweendemux_only
(demux and fastqc) anddemux_plus
(which adds kraken, spades, etc) and just does demux, fastqc, and depletion. If optionally given "augmented" samplesheets and NCBI BioSample mappings, it will produce SRA submission bundles as well. [#191] - new workflow
mafft_and_snp_annotated
, which adds snpEff annotation to the snp-sites output [#194]
New SARS-CoV-2 specific workflows. Up until this release, all included workflows were generally applicable to most viral taxa. This release includes a number of single-taxon workflows exclusively for SARS-CoV-2 in order to increase efficiency for high throughput work on this one virus.
-
new workflow
sarscov2_illumina_full
is a full end-to-end workflow from Illumina BCL tarball through Genbank, SRA, and GISAID submission bundles. It wraps togetherdemux_deplete
,assemble_refbased
,sarscov2_lineages
,sarscov2_genbank
. It requires the user to pre-register NCBI BioSample entries and to provide an "augmented" samplesheet for demux. [#191, #196] -
new workflow
sarscov2_genbank
. Prepares single-segmented genome assemblies for submission to NCBI Genbank using their new SARS-CoV-2 submission mechanism (which may become more mainstream for other viruses as well). Incorporates the new VADR (Viral Annotation DefineR) tool from NCBI to annotate (produce tbl files) and QC (flag frameshift and other problems) using the same settings that Genbank uses for QC -- this filters out genomes from submission that fail VADR QC and should result in Genbank submissions with no rejections. [#191] -
new workflow
sarscov2_lineages
andsarscov2_nextclade_multi
. Runs Nextclade and Pangolin to do lineage/clade classification on SARS-CoV-2 genomes. [#184, #185, #186] -
nextstrain/augur workflow improvements and bugfixes to allow for merging of multiple metadata tsv files. This simplifies the process of regular builds where some data is changing frequently [#189, #181, #191]
-
VM shape updates [#188]
-
README update with diagram [#183, @llangit-broad]
v2.1.12.0
v2.1.10.0
New or changed WDL workflows:
- new workflow: subsample_by_metadata_with_focal [#161]
Changes:
- rename workflow augur_from_newick to augur_export_only, add new workflow augur_from_mltree [#151]
- drop support for trinity (pinned version) assembler [#168]
- bump upstream docker images [#169, #163]
- README changes [#162]
Fixes:
- bugfix: isnvs_per_sample when specifying optional parameters [#160]
- bugfixes for use of
set -o pipefail
[#175, #173] - bugfix: optional input handling in filter_bam_to_taxa [#171]
VM shape parameterization or changes to defaults:
- mafft parameterization [#170]
- memory increase to task filter_sequences_to_list [#159]
- parameterize BEAST GPU settings [#158]
- memory increase to assemble_refbased (specifically to task align_reads) [#157]
- memory increase to task multi_align_mafft [#155]
- memory increase & cpu decrease to task refine_augur_tree and ancestral_tree (the timetree invocations) [#153]
- parameterize CPU count for task draft_augur_tree (iqtree) [#150]
DNAnexus: