Skip to content

Releases: sourmash-bio/sourmash_plugin_directsketch

v0.6.3

29 Apr 21:32
11e9b81
Compare
Choose a tag to compare

This is a bugfix to change the name of the file generated when building batched zipfiles to {output}.batchlist.txt, rather than {output}.batches.txt in order to match (and unify) the documentation.

What's Changed

  • MRG: standardize on .batchlist.txt as batch list filename by @bluegenes in #251
  • MRG: upd to 0.6.3; cleaner arg notification by @bluegenes in #255

Dependabot

Full Changelog: v0.6.2...v0.6.3

v0.6.2

18 Apr 20:34
4fa15ff
Compare
Choose a tag to compare
  • adds --allow-completed, an option to allow batched restart to exit cleanly (code 0) if there are existing signatures but no new signatures could be written on this restart.
  • adds additional output file for batched runs {output}.batches.txt - a file containing a list of all signature batches created.

What's Changed

  • MRG: optionally allow empty sig zips; write list of batch outputs to file by @bluegenes in #249
  • improve doc; bump version to 0.6.2 by @bluegenes in #250

Full Changelog: v0.6.1...v0.6.2

v0.6.1

15 Apr 17:19
ed843cf
Compare
Choose a tag to compare

What's Changed

  • Adds explicit checks for sigterm and graceful exit
  • fix a bug where we wrote multiple failure entries for gbsketch accessions missing fetch urls

PRs

  • MRG: update help for simultaneous downloads; clean up prior n limitation code by @bluegenes in #239
  • MRG: add explicit sigterm handling by @bluegenes in #243
  • MRG: fix duplicated failure writing for missing gbsketch URLs by @bluegenes in #244
  • MRG: fix deprecations (rm use of sourmash.load_one_signature) by @bluegenes in #245
  • bump version to 0.6.1 by @bluegenes in #246

Dependabot updates

Full Changelog: v0.6.0...v0.6.1

v0.6.0

06 Apr 21:38
255299a
Compare
Choose a tag to compare

v0.6.0 modifies the gbsketch strategy to reduce the number of calls to the NCBI REST API (necessary due to policy changes, but a better strategy regardless). Rather than downloading all files via the API, we download a single file via the API that contains direct fetch links to all requested accessions. We then download accessions in parallel -- currently limited to 30 simultaneously. It also enables streaming processing for both gbsketch and urlsketch, reducing memory requirements. Finally, it improves batch restart for --keep-fasta by avoiding re-downloading and overwriting of completed downloads.

Note: gbsketch now relies on gzip processing (internal crc32 checks) rather than md5sum checks to ensure we have complete downloads. urlsketch will still check md5sums if they are provided.

What's Changed

Full Changelog: v0.5.0...v0.6.0

v0.5.0

29 Jan 00:56
5870f10
Compare
Choose a tag to compare

This release include some major functionality changes:

  • Download via NCBI REST API for gbsketch. Input file no longer uses ftp_path.
  • add --n-simultaneous-downloads parameter, and allow up to 10 if using an API KEY with gbsketch
  • Allow merged & ranged sketching in urlsketch
  • Enable building skipmer signatures (skipm1n3, skipm2n3; sourmash experimental addition)

And some nice UX updates:

  • use input csv as default base filename for --fail and --checksum-fail
  • ignore extra columns in gbsketch input CSV

It also fixes a bug where directsketch zips did not properly record n_hashes and thus did not get properly summarized via sourmash sig summarize.

This release includes first content contributions from @ctb 🎉 .

What's Changed

Functionality updates

  • MRG: modify n simultaneous downloads; update buildutils by @bluegenes in #154
  • MRG: add skipmer sketching by @bluegenes in #159
  • MRG: fix manifest n_hashes + test by @bluegenes in #171
  • MRG: Enable merged sigs, sequence range selection in urlsketch by @bluegenes in #161
  • MRG: batched zip reporting - notify after finishing batch to be clearer by @bluegenes in #179
  • MRG: download via NCBI REST API by @bluegenes in #181
  • MRG: doc rerunning failures by @bluegenes in #184
  • MRG: set n-simultaneous-downloads to 9 if api key provided by @ctb in #194
  • MRG: provide default failed filenames based on CSV by @ctb in #195
  • MRG: ignore extra columns in gbsketch input CSV by @ctb in #188

Developer updates

dependabot

Full Changelog: v0.4.1...v0.5.0

v0.4.1

22 Oct 01:21
fafdb7a
Compare
Choose a tag to compare

What's Changed

This release includes a bugfix where using a zipfile without an explicit path would yield an error (#118). The remaining changes are internal, including adding parameter string validation and improving the sketching utilities for potential use in other plugins.

dependabot

Full Changelog: v0.4.0...v0.4.1

v0.4.0

04 Oct 18:58
b1afbcd
Compare
Choose a tag to compare

This release introduces two new parameters:

  • --checksum-failures - an output file to log any failures with the checksum file download and parsing or any md5sum mismatches. Required for gbsketch
  • --batch-size - enables writing smaller, batched zipfiles. This is recommended for large database generation, as batches allow restart after unexpected failure. It also should address some issues arising from extremely large zips.

Under the hood, this release also introduces a standardized sketching building framework that may be useful outside of this plugin.

What's Changed

Dependabot

Full Changelog: v0.3.2...v0.4.0

v0.3.2

14 Jun 21:09
81242ac
Compare
Choose a tag to compare

What's Changed

Dependabot

New Contributors

  • @ctb made their first contribution in #52

Full Changelog: v0.3.1...v0.3.2

v0.3.1

21 May 07:10
ef97067
Compare
Choose a tag to compare
  • fixes URL formatting bug in failure output
  • adds new urlsketch command
  • changes failure output format for both gbsketch, urlsketch. The new header is: accession,name,moltype,md5sum,download_filename,url, which matches the urlsketch input format.

What's Changed

Dependabot and version updates

Full Changelog: v0.3.0...v0.3.1

v0.3.0

13 May 23:09
Compare
Choose a tag to compare

This release fixes a bug where the wrong version may be downloaded #27.

The input format has changed slightly! Required columns are now: accession,name,ftp_path. ftp_path column name must be present, but column can be empty.

  • if ftp_path is provided, it is used as the path for finding files associated with the accession. Otherwise, gbsketch will build the ftp_path from the accession.

What's Changed

  • optionally use ftp_path input for gbsketch by @bluegenes in #29
  • prevent unneccesary downloads by also setting genomes-only/proteomes-only via params if not keeping fastas by @bluegenes in #30
  • do not require signature output file if not sketching by @bluegenes in #31

Full Changelog: v0.2.3...v0.3.0