-
Notifications
You must be signed in to change notification settings - Fork 179
Description
I'm trying to finish up Yash's CRAM 3.1 codecs for htsjdk, and I came across an ambiguity in the name tokenizer spec that could use clarification. It currently states "The serialised data stream starts with two unsigned little endian 32-bit integers holding the total size of uncompressed name buffer and the number of read names".
Based on the values I see when inspecting samtools CRAMs/code, the uncompressed name buffer size assumes that the uncompressed names are formatted into a buffer the way htslib formats them, i.e., on decode, it would match the length of the read name data that is reconstructed from the stream(s), PLUS one byte for a separator for each read name, including a terminal separator. I don't think that's stated anywhere in the spec, so if correct, it would be useful to state explicitly.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status