-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Feature description
Kyle brought up an excellent point re: concerns about lifted-over variants potentially having different sequences/sequence lengths than the original variant.
James had a really good explanation of the problem so I'm copy/pasting it here for reference:
![]()
So let's say "Ref A" is GRCh37 and "Ref B" is GRCh38. In this figure, each colored block is supposed to be equivalent, ie anything in 0:100 on GRCh37 lifts over to the same position within 0:100 on GRCh38. But after 37, our scientists went looking and discovered that there was actually a whole other block (yellow) between (blue) and (orange) on The Real Human Genome, so they updated GRCh38 accordingly.
So then let's say I have a variant on GRCh37 with the location of {"start": 99, "end": 102}. If I liftover the start position, I'm gonna get 99, and if I lift over the end position, I get... 202. So now a 3 base deletion becomes a 103 base deletion. That seems worrisomeI guess the upshot is at minimum we probably need to do a double check that liftover doesn't alter the gap between location.start and location.end
Ideally we'd do something like verify that the sequence is the same but idk
Use case
We need to ensure that we're accurately lifting over variants
Acceptance Criteria
- If possible, verify that sequences are identical between the original variant and the lifted-over one
- If that's not feasible, then at least verify that the sequences lengths of both variants are the same
- If the variants' sequences are not the same (or at least the same length), then raise an Exception and do not complete the liftover operation.
Proposed solution
No response
Alternatives considered
No response
Implementation details
No response
Potential Impact
No response
Additional context
No response
Contribution
None