Skip to content

Commit 7871872

Browse files
committed
update RLE normalization
1 parent 4e7ed11 commit 7871872

File tree

1 file changed

+29
-7
lines changed

1 file changed

+29
-7
lines changed

docs/source/impl-guide/normalization.rst

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,7 @@ the following normalization rules apply:
9191
and `Alternate Allele Sequence`.
9292

9393
#. one is empty, the input Allele is an insertion (empty `reference
94-
sequence`) or a deletion (empty `alternate sequence`). Store the length
95-
of the non-empty sequence: this is the `Repeat Subunit Length`. Continue to
94+
sequence`) or a deletion (empty `alternate sequence`). Continue to
9695
step 3.
9796

9897
#. Determine bounds of ambiguity.
@@ -112,12 +111,35 @@ the following normalization rules apply:
112111

113112
#. Construct a new Allele covering the entire region of ambiguity.
114113

115-
a. If the `reference sequence` is empty, this is an unambiguous
116-
insertion. Return a new `Allele` with the trimmed `alternate
117-
sequence` as a `Literal Sequence Expression`.
114+
a. If the expanded `Reference Allele Sequence` is empty, this is an unambiguous insertion.
115+
Return a new `Allele` with the trimmed `Alternate Allele Sequence` as a `Literal
116+
Sequence Expression`.
118117

119-
#. Otherwise, return a new `Allele` using a `reference length
120-
expression`, using a `Location` specified by the coordinates
118+
#. Otherwise, find the greatest common denominator between the length of the expanded `Reference
119+
Allele Sequence` and the expanded `Alternate Allele Sequence`. This is the `repeat subunit length`.
120+
121+
#. If the Allele is a deletion (the `Alternate Allele Sequence` is shorter than the
122+
`Reference Allele Sequence`) return a new Allele using a `Location` specified by the coordinates
123+
of the `left_roll_bound` and `right_roll_bound`, a `length` specified by the length of the
124+
`Alternate Allele Sequence`, and a `repeat subunit length` as calculated in the prior step.
125+
126+
#. If the Allele is an insertion (the `Reference Allele Sequence` is shorter than the
127+
`Alternate Allele Sequence`), check that the first `repeat subunit length` number of characters
128+
of the `Reference Allele Sequence` can be cycled to reconstruct the `Alternate Allele Sequence`.
129+
130+
1. If so, return a new Allele using a `Location` specified by the coordinates of the `left_roll_bound`
131+
and `right_roll_bound`, and a `Reference Length Expression` with a `length` specified by the length
132+
of the `Alternate Allele Sequence`, and a `repeat subunit length` as previously calculated.
133+
134+
#. If not, return a new Allele using a `Location` specified by the coordinates of the `left_roll_bound`
135+
and `right_roll_bound`, and a `Literal Sequence Expression` with the expanded `Alternate Allele Sequence`.
136+
137+
138+
return a new Allele using a `Location` specified by the coordinates
139+
of the `left_roll_bound` and `right_roll_bound`, a `length` specified by the length of the
140+
`Alternate Allele Sequence`, and a `repeat subunit length` as calculated in the prior step.
141+
142+
using a `Location` specified by the coordinates
121143
of the `left_roll_bound` and `right_roll_bound`, a `length`
122144
specified by the length of the `alternate allele`, and a
123145
`repeat subunit length` as determined in step 2c.

0 commit comments

Comments
 (0)