Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the name tokeniser uncomp_len calculation (PR #803) #803

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jkbonfield
Copy link
Contributor

This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0').

Fixes #802

Copy link

github-actions bot commented Jan 7, 2025

Changed PDFs as of bece1f7: CRAMcodecs (diff).

CRAMcodecs.tex Outdated
The serialised data stream starts with two unsigned little endian
32-bit integers holding the total size of uncompressed name buffer and
the number of read names. This is followed the array elements
themselves. Note the uncompressed size the sum of all name lengths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
themselves. Note the uncompressed size the sum of all name lengths
themselves. Note the uncompressed size is calculated as the sum of all name lengths,

@cmnbroad
Copy link
Contributor

cmnbroad commented Jan 7, 2025

Looks good - thanks!

@jkbonfield jkbonfield added sam cram and removed sam labels Jan 7, 2025
This includes all visible read name bytes plus 1 termination byte per
name (e.g. '\0').

Fixes samtools#802
Copy link

github-actions bot commented Jan 7, 2025

Changed PDFs as of 4982e03: CRAMcodecs (diff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Needs review
Development

Successfully merging this pull request may close these issues.

Name tokenizer codec clarification
2 participants