Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: nondeterministic inclusion of sequences in extract-reads #210

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

ebolyen
Copy link
Member

@ebolyen ebolyen commented Jan 31, 2025

Discovered by: https://forum.qiime2.org/t/qiime-feature-classifier-extract-reads-successfully-extracts-different-number-of-reads-each-time/32437

The issue seems to be that degenerate bases are a set, which means they get a different order between Python sessions:
https://github.com/scikit-bio/scikit-bio/blob/main/skbio/sequence/_dna.py#L188-L198

As a consequence, when a terrible alignment exists, this code:

if best_score is None or score > best_score:
            best_score = score

goes with the first alignment found (which is nondeterministic). Since it's a bad alignment, none of the other concrete sequences in the degenerate list can beat it. So, depending on the specific identity of that first alignment, the sequence will be retained or rejected.

Essentially, only terrible alignments will be occasionally included, based on an (un)lucky order of the degenerate map.

We should do something other than this PR to solve the issue, since I don't think we want these bad alignments anyhow. But worst case, this will at least make the bad alignments appear the same way across different runs.

Another option, which seems to work is increasing identity which causes the bad alignments to be filtered out (even if they got lucky before).

@lizgehret
Copy link
Member

test failures caused by rna 2.2.0. fix in progress here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Development
Development

Successfully merging this pull request may close these issues.

3 participants