-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find_chunks in _load_seq.py does not end a chunk early at the end of a supercontig #43
Comments
Original comment by Michael Hoffman (Bitbucket: hoffman, GitHub: michaelmhoffman). What do you mean, it "does not index" this data? |
Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86). The |
Original comment by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86). After a discussion, the following solution was proposed:
|
Original report (archived issue) by Eric Roberts (Bitbucket: ericr86, GitHub: ericr86).
Currently, Genomedata does not index missing data greater than MIN_GAP_LEN.
However, if the ending of a supercontig is completely full of NaNs, this data will be indexed regardless of length. In the extreme case a supercontig could start with a single datapoint and contain only remaining NaNs and the chunk start and end would contain the entire region even if the region was far greater than MIN_GAP_LEN.
This results in Genomedata reporting large empty regions if the "chunk_starts/ends" attributes are used at the beginning and ending of supercontigs.
The text was updated successfully, but these errors were encountered: