Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate a plain non-synthetic dataset for Bio sequences #91

Open
ashvardanian opened this issue Feb 13, 2024 · 0 comments
Open

Aggregate a plain non-synthetic dataset for Bio sequences #91

ashvardanian opened this issue Feb 13, 2024 · 0 comments
Labels
good first issue Good for newcomers

Comments

@ashvardanian
Copy link
Owner

For fair benchmarks of Needleman-Wunsch scoring algorithms we should find a real-world protein bank and ideally export it into a whitespace or newline delimited .txt file, that will be easy to parse not only in Python, but also in C++. Community contributions more than welcome 🤗

@ashvardanian ashvardanian added the good first issue Good for newcomers label Feb 13, 2024
ashvardanian added a commit that referenced this issue Feb 13, 2024
Requesting more dataset contributions #91
ashvardanian pushed a commit that referenced this issue Feb 15, 2024
# [3.1.0](v3.0.0...v3.1.0) (2024-02-15)

### Add

* `sz_isascii` and UTF8 Levenshtein distance ([a0962fb](a0962fb))
* 32-bit support with CPython ([253a3c1](253a3c1))
* Big-endian support ([b126fab](b126fab))
* Levenshtein & NW score for Rust (#89) ([663a633](663a633)), closes [#89](#89)
* Macro SZ_NULL_CHAR, Clang-CL instrinsics. (#88) ([dee90bb](dee90bb)), closes [#88](#88)
* serial clz/ctz for Win32 ([c968337](c968337))

### Docs

* sectioning contribution guide ([cf6ced0](cf6ced0)), closes [#91](#91)

### Fix

* Clamping bounded Levenshtein ([69892fb](69892fb))
* Memory leak in macro ([c88a72a](c88a72a))

### Improve

* Port to `arm32v7` 32-bit arch ([4acf3b7](4acf3b7))

### Make

* `cibuildwheel.overrides` over custom scripts ([6d8c586](6d8c586))
* Clear root directory ([7497c96](7497c96))
* Constrain workflow names ([079f111](079f111))
* Disable a;; CI versioning ([a55d227](a55d227))
* Drop NumPy dependency ([c56239e](c56239e))
* Fix implicit `malloc` declaration ([f7761be](f7761be))
* Infer big-endian in CMake/setup.py ([72453c6](72453c6))
* Keywords for crates.io ([8d237a6](8d237a6))
* Overwrite packs with same name ([0642318](0642318))
* Packing CIBuildWheels for all archs ([49bee70](49bee70))
* Parallel wheels compilation ([0f5a946](0f5a946))
* Upgrade GitHub CI ([cd424ca](cd424ca))
* Upgrade Python CI ([4f1bf43](4f1bf43))
* Use QEMU for Linux wheels ([ac4556a](ac4556a))
@ashvardanian ashvardanian changed the title Aggregate a plain non-synthetic dataset for protein sequences Aggregate a plain non-synthetic dataset for Bio sequences Apr 27, 2024
ashvardanian pushed a commit that referenced this issue Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant