Releases: sourmash-bio/sourmash_plugin_directsketch
v0.6.3
This is a bugfix to change the name of the file generated when building batched zipfiles to {output}.batchlist.txt
, rather than {output}.batches.txt
in order to match (and unify) the documentation.
What's Changed
- MRG: standardize on
.batchlist.txt
as batch list filename by @bluegenes in #251 - MRG: upd to 0.6.3; cleaner arg notification by @bluegenes in #255
Dependabot
- Bump pyo3 from 0.24.1 to 0.24.2 by @dependabot in #253
- Bump tokio-util from 0.7.14 to 0.7.15 by @dependabot in #252
Full Changelog: v0.6.2...v0.6.3
v0.6.2
- adds
--allow-completed
, an option to allow batched restart to exit cleanly (code 0) if there are existing signatures but no new signatures could be written on this restart. - adds additional output file for batched runs
{output}.batches.txt
- a file containing a list of all signature batches created.
What's Changed
- MRG: optionally allow empty sig zips; write list of batch outputs to file by @bluegenes in #249
- improve doc; bump version to 0.6.2 by @bluegenes in #250
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's Changed
- Adds explicit checks for
sigterm
and graceful exit - fix a bug where we wrote multiple failure entries for
gbsketch
accessions missing fetch urls
PRs
- MRG: update help for simultaneous downloads; clean up prior
n
limitation code by @bluegenes in #239 - MRG: add explicit sigterm handling by @bluegenes in #243
- MRG: fix duplicated failure writing for missing gbsketch URLs by @bluegenes in #244
- MRG: fix deprecations (rm use of
sourmash.load_one_signature
) by @bluegenes in #245 - bump version to 0.6.1 by @bluegenes in #246
Dependabot updates
- Bump zip from 2.6.0 to 2.6.1 by @dependabot in #238
- Bump openssl from 0.10.71 to 0.10.72 by @dependabot in #237
- Bump tokio from 1.44.1 to 1.44.2 by @dependabot in #236
- Bump anyhow from 1.0.97 to 1.0.98 by @dependabot in #241
Full Changelog: v0.6.0...v0.6.1
v0.6.0
v0.6.0 modifies the gbsketch
strategy to reduce the number of calls to the NCBI REST API (necessary due to policy changes, but a better strategy regardless). Rather than downloading all files via the API, we download a single file via the API that contains direct fetch links to all requested accessions. We then download accessions in parallel -- currently limited to 30 simultaneously. It also enables streaming processing for both gbsketch
and urlsketch
, reducing memory requirements. Finally, it improves batch restart for --keep-fasta
by avoiding re-downloading and overwriting of completed downloads.
Note: gbsketch
now relies on gzip processing (internal crc32 checks) rather than md5sum checks to ensure we have complete downloads. urlsketch
will still check md5sums if they are provided.
What's Changed
- MRG: upd readme: installation, authors by @bluegenes in #200
- Update README.md by @bluegenes in #201
- add more api key docs by @bluegenes in #206
- MRG: fix
gbsketch
NCBI downloads by using dehydrate-rehydrate approach by @bluegenes in #222 - update some crates by @bluegenes in #229
- Bump openssl from 0.10.69 to 0.10.71 by @dependabot in #205
- MRG:
gbsketch
streaming processing by @bluegenes in #230 - upd zip to 2.6 by @bluegenes in #231
- MRG: streaming
urlsketch
processing by @bluegenes in #232 - MRG: Consolidate reused code across
gbsketch
/urlsketch
by @bluegenes in #233 - MRG: optionally avoid re-downloading existing FASTA when using
--keep-fasta
by @bluegenes in #235 - MRG: upd version to 0.6.0; pin rust to 1.74 by @bluegenes in #234
Full Changelog: v0.5.0...v0.6.0
v0.5.0
This release include some major functionality changes:
- Download via NCBI REST API for
gbsketch
. Input file no longer usesftp_path
. - add
--n-simultaneous-downloads
parameter, and allow up to 10 if using an API KEY withgbsketch
- Allow merged & ranged sketching in
urlsketch
- Enable building skipmer signatures (
skipm1n3
,skipm2n3
; sourmash experimental addition)
And some nice UX updates:
- use input csv as default base filename for
--fail
and--checksum-fail
- ignore extra columns in
gbsketch
input CSV
It also fixes a bug where directsketch
zips did not properly record n_hashes
and thus did not get properly summarized via sourmash sig summarize
.
This release includes first content contributions from @ctb 🎉 .
What's Changed
Functionality updates
- MRG: modify n simultaneous downloads; update buildutils by @bluegenes in #154
- MRG: add skipmer sketching by @bluegenes in #159
- MRG: fix manifest n_hashes + test by @bluegenes in #171
- MRG: Enable merged sigs, sequence range selection in
urlsketch
by @bluegenes in #161 - MRG: batched zip reporting - notify after finishing batch to be clearer by @bluegenes in #179
- MRG: download via NCBI REST API by @bluegenes in #181
- MRG: doc rerunning failures by @bluegenes in #184
- MRG: set n-simultaneous-downloads to 9 if api key provided by @ctb in #194
- MRG: provide default failed filenames based on CSV by @ctb in #195
- MRG: ignore extra columns in gbsketch input CSV by @ctb in #188
Developer updates
- MRG: remove
BuildParams
, filter via manifest /Select
approaches by @bluegenes in #127 - try fixing ci by @bluegenes in #157
- upd sourmash core to 0.17.2 by @bluegenes in #156
- bump version; add ctb to authors by @bluegenes in #199
dependabot
- Bump tokio from 1.40.0 to 1.41.0 by @dependabot in #130
- Bump pyo3 from 0.22.5 to 0.23.3 by @dependabot in #151
- Bump tokio from 1.41.0 to 1.42.0 by @dependabot in #150
- Bump reqwest from 0.12.8 to 0.12.9 by @dependabot in #136
- Bump regex from 1.11.0 to 1.11.1 by @dependabot in #133
- Bump tokio-util from 0.7.12 to 0.7.13 by @dependabot in #149
- Bump anyhow from 1.0.90 to 1.0.94 by @dependabot in #152
- Bump serde_json from 1.0.132 to 1.0.134 by @dependabot in #162
- Update pytest-cov requirement from <6.0,>=2.12 to >=2.12,<7.0 by @dependabot in #135
- Bump anyhow from 1.0.94 to 1.0.95 by @dependabot in #163
- Bump reqwest from 0.12.9 to 0.12.12 by @dependabot in #169
- Bump serde from 1.0.216 to 1.0.217 by @dependabot in #167
- Bump tokio from 1.42.0 to 1.43.0 by @dependabot in #176
- Bump pyo3 from 0.23.3 to 0.23.4 by @dependabot in #178
- Bump serde_json from 1.0.134 to 1.0.135 by @dependabot in #177
- Bump serde_json from 1.0.135 to 1.0.137 by @dependabot in #190
- Bump getset from 0.1.3 to 0.1.4 by @dependabot in #197
- Bump openssl from 0.10.68 to 0.10.69 by @dependabot in #196
Full Changelog: v0.4.1...v0.5.0
v0.4.1
What's Changed
This release includes a bugfix where using a zipfile without an explicit path would yield an error (#118). The remaining changes are internal, including adding parameter string validation and improving the sketching utilities for potential use in other plugins.
- MRG: refactor sketching utilities by @bluegenes in #112
- MRG: validate param strings by @bluegenes in #114
- MRG: update sourmash core to 0.16.0 by @bluegenes in #115
- MRG: fix bug in zip paths if output provided in current dir by @bluegenes in #121
- bump to 0.4.1 by @bluegenes in #128
dependabot
- Bump reqwest from 0.12.7 to 0.12.8 by @dependabot in #110
- Bump futures from 0.3.30 to 0.3.31 by @dependabot in #111
- Bump pyo3 from 0.22.3 to 0.22.5 by @dependabot in #122
- Bump anyhow from 1.0.89 to 1.0.90 by @dependabot in #126
- Bump serde_json from 1.0.128 to 1.0.132 by @dependabot in #124
- Bump openssl from 0.10.66 to 0.10.68 by @dependabot in #125
Full Changelog: v0.4.0...v0.4.1
v0.4.0
This release introduces two new parameters:
--checksum-failures
- an output file to log any failures with the checksum file download and parsing or any md5sum mismatches. Required forgbsketch
--batch-size
- enables writing smaller, batched zipfiles. This is recommended for large database generation, as batches allow restart after unexpected failure. It also should address some issues arising from extremely large zips.
Under the hood, this release also introduces a standardized sketching building framework that may be useful outside of this plugin.
What's Changed
- MRG: report checksum file download failures by @bluegenes in #92
- MRG: add generic support for signature building by @bluegenes in #101
- MRG: improve restart by optionally writing batched zipfiles by @bluegenes in #102
- MRG: fix ci by moving install from
mambaforge
-->miniforge
by @bluegenes in #106 - bump to v0.4.0 by @bluegenes in #109
Dependabot
sourmash-core
:- Bump sourmash from 0.14.0 to 0.14.1 by @dependabot in #62
- Bump sourmash from 0.14.1 to 0.15.0 by @dependabot in #75
- Bump sourmash from 0.15.0 to 0.15.1 by @dependabot in #87
- Bump sourmash from 0.15.1 to 0.15.2 by @dependabot in #103
simple-error
:- Bump simple-error from 0.3.0 to 0.3.1 by @dependabot in #59
reqwest
:- Bump reqwest from 0.12.4 to 0.12.5 by @dependabot in #60
- Bump reqwest from 0.12.5 to 0.12.7 by @dependabot in #88
lazy_static
:- Bump lazy_static from 1.4.0 to 1.5.0 by @dependabot in #61
pyo3
:- Bump pyo3 from 0.21.2 to 0.22.0 by @dependabot in #64
- Bump pyo3 from 0.22.0 to 0.22.1 by @dependabot in #66
- Bump pyo3 from 0.22.1 to 0.22.2 by @dependabot in #73
- Bump pyo3 from 0.22.2 to 0.22.3 by @dependabot in #99
serde_json
:- Bump serde_json from 1.0.117 to 1.0.119 by @dependabot in #63
- Bump serde_json from 1.0.119 to 1.0.120 by @dependabot in #67
serde
:- Bump serde from 1.0.203 to 1.0.204 by @dependabot in #65
tokio
:- Bump tokio from 1.38.0 to 1.38.1 by @dependabot in #74
- Bump tokio from 1.38.1 to 1.40.0 by @dependabot in #91
pytest
:- Update pytest requirement from <8.3.0,>=6.2.4 to >=6.2.4,<8.4.0 by @dependabot in #71
openssl
:- Bump openssl from 0.10.64 to 0.10.66 by @dependabot in #72
regex
:- Bump regex from 1.10.5 to 1.10.6 by @dependabot in #80
- Bump regex from 1.10.6 to 1.11.0 by @dependabot in #104
anyhow
:- Bump anyhow from 1.0.86 to 1.0.89 by @dependabot in #100
Full Changelog: v0.3.2...v0.4.0
v0.3.2
What's Changed
- MRG: update to sourmash-rs core r0.14.0 by @ctb in #52
- MRG: set zip permissions to 644 by @bluegenes in #53
- MRG: enable dayhoff, hp sketching by @bluegenes in #55
- bump version to 0.3.2 by @bluegenes in #54
Dependabot
-
Bump tokio from 1.37.0 to 1.38.0 by @dependabot in #46
-
Bump serde from 1.0.202 to 1.0.203 by @dependabot in #45
-
Bump regex from 1.10.4 to 1.10.5 by @dependabot in #51
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1
- fixes URL formatting bug in failure output
- adds new
urlsketch
command - changes failure output format for both
gbsketch
,urlsketch
. The new header is:accession,name,moltype,md5sum,download_filename,url
, which matches theurlsketch
input format.
What's Changed
- fix url printing by @bluegenes in #36
- add
urlsketch
command by @bluegenes in #34
Dependabot and version updates
- Bump anyhow from 1.0.83 to 1.0.86 by @dependabot in #39
- Bump serde from 1.0.201 to 1.0.202 by @dependabot in #38
- Bump camino from 1.1.6 to 1.1.7 by @dependabot in #37
- bump version to 0.3.1 by @bluegenes in #43
Full Changelog: v0.3.0...v0.3.1
v0.3.0
This release fixes a bug where the wrong version may be downloaded #27.
The input format has changed slightly! Required columns are now: accession,name,ftp_path
. ftp_path
column name must be present, but column can be empty.
- if
ftp_path
is provided, it is used as the path for finding files associated with the accession. Otherwise,gbsketch
will build theftp_path
from the accession.
What's Changed
- optionally use ftp_path input for
gbsketch
by @bluegenes in #29 - prevent unneccesary downloads by also setting genomes-only/proteomes-only via params if not keeping fastas by @bluegenes in #30
- do not require signature output file if not sketching by @bluegenes in #31
Full Changelog: v0.2.3...v0.3.0