Clarify the name tokeniser uncomp_len calculation (PR #803)#803
Merged
jkbonfield merged 1 commit intosamtools:masterfrom Mar 24, 2025
Merged
Clarify the name tokeniser uncomp_len calculation (PR #803)#803jkbonfield merged 1 commit intosamtools:masterfrom
jkbonfield merged 1 commit intosamtools:masterfrom
Conversation
|
Changed PDFs as of bece1f7: CRAMcodecs (diff). |
cmnbroad
reviewed
Jan 7, 2025
Contributor
|
Looks good - thanks! |
jkbonfield
added a commit
to jkbonfield/hts-specs
that referenced
this pull request
Jan 7, 2025
This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802
bece1f7 to
4982e03
Compare
|
Changed PDFs as of 4982e03: CRAMcodecs (diff). |
zaeleus
reviewed
Jan 22, 2025
CRAMcodecs.tex
Outdated
Comment on lines
2455
to
2456
| the number of read names. This is followed the array elements | ||
| themselves. Note the uncompressed size is calculated as the sum of |
There was a problem hiding this comment.
The serialised data stream starts with two unsigned little endian 32-bit integers... This is followed the array elements themselves.
This is unrelated to the length calculation, but note there is also a 1 byte flag between the 2 integers and the data stream:
| Bytes | Type | Name |
|---|---|---|
| 4 | uint32 | uncomp_length |
| 4 | uint32 | num_reads |
| 1 | uint8 | use_arith |
Contributor
Author
There was a problem hiding this comment.
Agreed. It also had a poor and nebulous "array elements" term which I expanded on.
jkbonfield
added a commit
to jkbonfield/hts-specs
that referenced
this pull request
Jan 28, 2025
This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802 Also clarify the name tokeniser serialisation description. Acknowledge the 1-byte "use_arith" field and replace the nebulous "array elements" with a more descriptive text about token streams.
4982e03 to
7de0ae0
Compare
|
Changed PDFs as of 7de0ae0: CRAMcodecs (diff). |
This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0'). Fixes samtools#802 Also clarify the name tokeniser serialisation description. Acknowledge the 1-byte "use_arith" field and replace the nebulous "array elements" with a more descriptive text about token streams.
7de0ae0 to
7b61997
Compare
|
Changed PDFs as of 7b61997: CRAMcodecs (diff). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This includes all visible read name bytes plus 1 termination byte per name (e.g. '\0').
Fixes #802