BUG: nondeterministic inclusion of sequences in extract-reads #210

ebolyen · 2025-01-31T17:50:26Z

Discovered by: https://forum.qiime2.org/t/qiime-feature-classifier-extract-reads-successfully-extracts-different-number-of-reads-each-time/32437

The issue seems to be that degenerate bases are a set, which means they get a different order between Python sessions:
https://github.com/scikit-bio/scikit-bio/blob/main/skbio/sequence/_dna.py#L188-L198

As a consequence, when a terrible alignment exists, this code:

if best_score is None or score > best_score:
            best_score = score

goes with the first alignment found (which is nondeterministic). Since it's a bad alignment, none of the other concrete sequences in the degenerate list can beat it. So, depending on the specific identity of that first alignment, the sequence will be retained or rejected.

Essentially, only terrible alignments will be occasionally included, based on an (un)lucky order of the degenerate map.

We should do something other than this PR to solve the issue, since I don't think we want these bad alignments anyhow. But worst case, this will at least make the bad alignments appear the same way across different runs.

Another option, which seems to work is increasing identity which causes the bad alignments to be filtered out (even if they got lucky before).

lizgehret · 2025-01-31T19:54:00Z

test failures caused by rna 2.2.0. fix in progress here.

WIP

cc1079d

Oddant1 assigned gregcaporaso Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: nondeterministic inclusion of sequences in extract-reads #210

BUG: nondeterministic inclusion of sequences in extract-reads #210

ebolyen commented Jan 31, 2025 •

edited

Loading

lizgehret commented Jan 31, 2025

BUG: nondeterministic inclusion of sequences in extract-reads #210

Are you sure you want to change the base?

BUG: nondeterministic inclusion of sequences in extract-reads #210

Conversation

ebolyen commented Jan 31, 2025 • edited Loading

lizgehret commented Jan 31, 2025

ebolyen commented Jan 31, 2025 •

edited

Loading