You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
§ 8.4.1 "Compression header block: Preservation map" describes the reference required (RR) flag as "true if reference sequence is required to restore the data completely". If this is true and the records do not require a reference sequence to restore the data (e.g., an unmapped slice), is it considered an invalid state?
As I understand it, this flag should be false if the slice is unmapped, but some implementation don't set it as such, e.g., htslib:
All the records in this slice are unmapped (cram_dump: "Slice ref seq -1"), and this implicitly sets the reference required field to true, as per "The boolean values are optional, defaulting to true when absent..." However, a reference sequence is not required to decode this.
The text was updated successfully, but these errors were encountered:
That's a good point. CRAM v1 didn't have RR. It was added for v2 so we could encode data without needing a reference; useful for unmapped data and also sometimes for mapped but unsorted data. It looks like I never added the explicit RR=0 data in there though.
I'm not sure if this makes the htslib files explicitly non-conforming or not. Arguably yes as the spec is written, but the intention was simply to use this as a hint for whether a reference needs to be loaded, and where the container and slice header explicitly. Given the container has "ref id: -1", it's neither here nor there whether it attempts to load it or not as it's akin to saying "don't load the reference" vs "load reference ".
I'll chalk it up as an htslib bug however.
Edit: I tested picard 2.26 (we don't have a new enough Java for picard 3) and it explicitly sets RR=1, which is even more inappropriate. Bah.
This is in regard to CRAM format specification (version 3.1) (2024-09-04).
§ 8.4.1 "Compression header block: Preservation map" describes the reference required (
RR
) flag as "true if reference sequence is required to restore the data completely". If this is true and the records do not require a reference sequence to restore the data (e.g., an unmapped slice), is it considered an invalid state?As I understand it, this flag should be false if the slice is unmapped, but some implementation don't set it as such, e.g., htslib:
All the records in this slice are unmapped (cram_dump: "Slice ref seq -1"), and this implicitly sets the reference required field to true, as per "The boolean values are optional, defaulting to true when absent..." However, a reference sequence is not required to decode this.
The text was updated successfully, but these errors were encountered: