Skip to content

Cigar interface#12

Open
magicDGS wants to merge 1 commit intomasterfrom
dgs_cigar_interface
Open

Cigar interface#12
magicDGS wants to merge 1 commit intomasterfrom
dgs_cigar_interface

Conversation

@magicDGS
Copy link
Copy Markdown
Member

No description provided.

@magicDGS magicDGS force-pushed the dgs_cigar_interface branch from 6acaf3e to 926e31b Compare July 18, 2018 17:54
@lindenb
Copy link
Copy Markdown

lindenb commented Jul 19, 2018

@magicDGS in Cigar : shouldn't you use long instead of int for the lengths ?

@magicDGS
Copy link
Copy Markdown
Member Author

@lindenb - that's a good question, as I was thinking about it for some time (also in the draft that I am preparing for reads), and I decided to use int instead of long for several reasons:

  • Arrays/Lists in java are always int indexed - thus, the maximum lenght for most implementations will be Integer.MAX_VALUE (231-1)
  • In the SAM specs, the template length (always lower or equal than the read length unless there is only one read) is in the range [-231+1, 231-1], which fits into the java Integer.MAX_VALUE range
  • In the BAM specs, both l_seq (int32_t) and n_cigar_op (int16_t) fit into the java Integer.MAX_VALUE range
  • In the CRAM v3 specs, the read lenghts are defined as int, which is also a signed 32-bit integer. Thus, it also fits in the Integer.MAX_VALUE range

In the case of Locatable and other reference-based coordinates, I proposed the change of the API to use long as you suggest in your comment (see proposal for Locatable interface in #6).

@lindenb - are you worried of integer overflow due to the sum of the element's length? In that case, either we return long as you suggest or explicitly add to the API that the sum of each element length should not be more than Integer.MAX_VALUE to avoid the issue (we can add a check in default implementation).

What do you think, @samtools/htsjdk-next-maintainers?

@lindenb
Copy link
Copy Markdown

lindenb commented Jul 19, 2018

are you worried of integer overflow due to the sum of the element's length?

I was worried about someone who could design a 'virtual' read spanning a whole sequence :-) but I agree keeping 'int' is easier.

@magicDGS
Copy link
Copy Markdown
Member Author

If there is any usage case for that kind of virtual read, we can either 1) make it possible to return long in an implementation-dependent manner; or 2) add a new interface for LongCigar extending this one, and add a method to check if it allows long sizes (e.g., with a method that throws by default in Cigar). Other solutions are welcome if this is really needed.

@magicDGS magicDGS force-pushed the dgs_cigar_interface branch from 926e31b to 27c6125 Compare September 9, 2018 13:36
@magicDGS magicDGS force-pushed the dgs_cigar_interface branch from 27c6125 to 6e552ae Compare September 9, 2018 14:01
@magicDGS magicDGS requested a review from a team September 9, 2018 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants