Skip to content
Draft
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
5054739
Support indefinite and definite uncertain ranges, and unbalanced unce…
theferrit32 Sep 20, 2023
1f624ad
Rebuild hgvs grammar due to whitespace changes
theferrit32 Sep 20, 2023
2bcccc2
Merge branch 'main' into 225-uncertain-ranges
andreasprlic Dec 11, 2023
722b142
Merge branch '712-fix-dev-install' into 225-uncertain-ranges
andreasprlic Dec 11, 2023
c15c552
feat(test): minor addition to test to make sure the breakpoints are u…
andreasprlic Dec 11, 2023
8a4341a
feat(g_to_c): this adds support for g_to_c of uncertain coordinates. …
andreasprlic Dec 12, 2023
1983654
Merge branch 'main' into 225-uncertain-ranges
andreasprlic Dec 12, 2023
2e4ae22
update CODEOWNERS to @biocommons/maintainers
reece Jan 19, 2024
82d7331
use shared stale action configuration
reece Jan 31, 2024
a045165
import and standardize issue templates from biocommons.example
Jan 31, 2024
43a556c
add standardized github labels and update action
Jan 31, 2024
3ea0c5c
expose clearer name for label sync action
Jan 31, 2024
1e2a433
remove .github/ISSUE_TEMPLATE (in order to use templates in .github r…
Feb 1, 2024
0c4c5d7
migrate to endbug/label-sync with biocommons-wide label config
Feb 1, 2024
4a44582
#688 - remove __future__ usage
davmlaw Feb 7, 2024
6056847
#695 - Remove top level code environment scripts from modules
davmlaw Feb 9, 2024
cce1c6f
update stale action to use stale.yml from worfklow-template
Feb 13, 2024
261245a
feat(imprecise hgvs_c): adding support to create imprecise hgvs_c s.
andreasprlic Feb 20, 2024
52adbe9
Merge branch 'main' into 225-uncertain-ranges
andreasprlic Feb 20, 2024
609244a
feat(imprecise g_to_c): g_to_c now works. c_to_g not working yet. Req…
andreasprlic Feb 21, 2024
0364f37
feat(cleanup): removing broken c_to_g unit test for imprecise events.
andreasprlic Feb 24, 2024
4c7ecbf
Simplify parser to use None for single-digit intervals. Fixed format …
theferrit32 Mar 11, 2024
cec05d3
Merge branch 'main' into 225-uncertain-ranges
andreasprlic Dec 24, 2024
f591a1a
Merge branch 'main' into 225-uncertain-ranges
andreasprlic Jan 20, 2025
8ddecd1
updating test cache
andreasprlic Jan 20, 2025
99e0060
adding a unit test for the examples from #225
andreasprlic Jan 20, 2025
053f686
installing pytest now for CI
andreasprlic Jan 20, 2025
f370655
installing pytest-cov for CI
andreasprlic Jan 21, 2025
b6bf4b6
updating test cache
andreasprlic Jan 21, 2025
1a857f9
more meddling with the CI
andreasprlic Jan 21, 2025
cd949db
fixing grammar based on PR feedback.
andreasprlic Jan 21, 2025
485f4cc
Merge branch 'main' into 225-uncertain-ranges
Apr 15, 2025
e6e16d8
update cassettes after merge with main
Apr 15, 2025
f8e2d2e
update cassettes after merge with main, take 2
Apr 15, 2025
8d7fb05
Merge branches '225-uncertain-ranges' and '225-uncertain-ranges' of g…
andreasprlic May 4, 2025
c69cc69
WIP
andreasprlic Jun 2, 2025
a1a7ece
WIP
andreasprlic Jun 18, 2025
8f8450d
WIP - first test is passing now in both directions.
andreasprlic Jun 24, 2025
77fe885
WIP, now we get through to test 2.
andreasprlic Jun 25, 2025
31f6ad9
WIP now at test 7
andreasprlic Jun 26, 2025
ed2c56a
WIP now at test 8
andreasprlic Jun 26, 2025
04d2398
#699 now adding support for going full g.->c.->g. for uncertain ranges
andreasprlic Jul 1, 2025
587c6fb
Removing prints, updating test cache after running seqrepo locally
andreasprlic Sep 3, 2025
64284cc
adding test cassettes for test_hgvs_sequencevariant
andreasprlic Sep 3, 2025
99b6b96
now with test cache after make test-learn-iteratively (has been renam…
andreasprlic Sep 3, 2025
49cb10a
Merge pull request #773 from andreasprlic/225-uncertain-ranges
andreasprlic Sep 3, 2025
9318465
resolving merge conflicts with latest main branch
andreasprlic Sep 4, 2025
7993a1d
re-ran the test cache now with correct learning
andreasprlic Sep 4, 2025
e96e264
cleanup of pre-commit hook
andreasprlic Sep 4, 2025
19ddbd1
renaming method to be a private method
andreasprlic Sep 4, 2025
8bf485a
Trivial changes to eliminate deepcopy in grammer and elimination of v…
toneillbroad Sep 4, 2025
6c0e85b
Merge pull request #785 from biocommons/225-uncertain-ranges-trivial-…
toneillbroad Sep 4, 2025
c7465fa
removing the option to enforce inner confidence interval only
andreasprlic Sep 4, 2025
efb3395
Merge branch '225-uncertain-ranges' of github.com:biocommons/hgvs int…
andreasprlic Sep 4, 2025
0c3fe2d
Remove vcr test output artifacts
toneillbroad Sep 4, 2025
ad59996
Merge pull request #786 from biocommons/225-remove-vcr-file-artifacts
theferrit32 Sep 4, 2025
855f759
cleanup of test, and moving normalizer bits to using start, end.
andreasprlic Sep 4, 2025
abca23b
Merge branch '225-uncertain-ranges' of github.com:biocommons/hgvs int…
andreasprlic Sep 4, 2025
877e405
cleanup
andreasprlic Sep 4, 2025
cf5a452
adding a normalizer check as well
andreasprlic Sep 4, 2025
57d6195
Refactor vcr test files
toneillbroad Sep 4, 2025
7e291c0
Merge pull request #789 from biocommons/225-vcr-test-files
theferrit32 Sep 4, 2025
ecae174
adding a new test to show that normalization of this variant is broken
andreasprlic Sep 4, 2025
e9839f9
Merge branch '225-uncertain-ranges' of github.com:biocommons/hgvs int…
andreasprlic Sep 5, 2025
a2e9ed2
adding a unit test for normalized CNV DUP
andreasprlic Sep 5, 2025
ec70dbc
Get BabelFish VCF to work with new start/end code
davmlaw Sep 5, 2025
effc67b
Merge branch 'biocommons:225-uncertain-ranges' into 225-uncertain-ranges
davmlaw Sep 5, 2025
98f63d7
Merge pull request #794 from davmlaw/225-uncertain-ranges
andreasprlic Sep 5, 2025
2eb763f
reverting changes to main branch
andreasprlic Sep 5, 2025
a7854d0
Merge branch '225-uncertain-ranges' of github.com:biocommons/hgvs int…
andreasprlic Sep 5, 2025
b3be243
reverting changes to main branch
andreasprlic Sep 5, 2025
7fed4d4
deleting no longer needed code
andreasprlic Sep 5, 2025
a2b7c30
no longer MAX_CHR_SIZE
andreasprlic Sep 5, 2025
8f9a9cc
Make comparison more precise, depending on which side of the comp has…
andreasprlic Sep 5, 2025
7051f58
adding more docstring to explain how get_start_end behaves.
andreasprlic Sep 5, 2025
f9fe947
small fix for get relevat_transcripts, to use new get_start_end approach
andreasprlic Sep 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions src/hgvs/_data/hgvs.pymeta
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# variant specification. The subset is limited to is limited to those
# rules that define sequence variants precisely. It does not current
# cover rules for translocations or conversions.

# The basic structure of a HGVS sequence variant is:
# <ac>:<type>.<posedit>
# where <ac> is a sequence accession, <type> determines the sequence
Expand All @@ -26,7 +26,7 @@ r_variant = accn:ac opt_gene_expr:gene ':' 'r':type '.' r_posedit:posedit -> hgv

############################################################################
## HGVS Position -- e.g., NM_01234.5:c.22+6 (without an edit)
# This is unofficial syntax
# This is unofficial syntax

hgvs_position = g_hgvs_position | m_hgvs_position | c_hgvs_position | n_hgvs_position | r_hgvs_position | p_hgvs_position

Expand Down Expand Up @@ -65,7 +65,7 @@ r_posedit = (r_interval:pos rna_edit:edit -> hgvs.posedit.PosEdit(pos=pos,edit=e
p_posedit = (p_interval:pos pro_edit:edit -> hgvs.posedit.PosEdit(pos=pos,edit=edit))
| ('(' p_interval:pos pro_edit:edit ')' -> hgvs.posedit.PosEdit(pos=pos,edit=edit, uncertain=True))
| p_posedit_special
p_posedit_special =
p_posedit_special =
'=':x -> hgvs.posedit.PosEdit(pos=None,edit=x,uncertain=False)
| '(' '=':x ')' -> hgvs.posedit.PosEdit(pos=None,edit=x,uncertain=True)
| '0':x '?' -> hgvs.posedit.PosEdit(pos=None,edit=x,uncertain=True)
Expand Down Expand Up @@ -122,20 +122,26 @@ pro_ident = '=' -> hgvs.edit.AARefAlt(ref='',alt=''

# potentially indefinite/uncertain intervals
c_interval = def_c_interval | '(' def_c_interval:iv ')' -> iv._set_uncertain()
g_interval = def_g_interval | '(' def_g_interval:iv ')' -> iv._set_uncertain()
g_interval = uncertain_g_interval:iv | ('(' def_g_interval:iv ')' -> iv._set_uncertain()) | def_g_interval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. The second alternative parse (with parentheses) is for an uncertain g_interval.

m_interval = def_m_interval | '(' def_m_interval:iv ')' -> iv._set_uncertain()
n_interval = def_n_interval | '(' def_n_interval:iv ')' -> iv._set_uncertain()
p_interval = def_p_interval | '(' def_p_interval:iv ')' -> iv._set_uncertain()
r_interval = def_r_interval | '(' def_r_interval:iv ')' -> iv._set_uncertain()

# definite intervals
def_g_interval = (g_pos:start '_' g_pos:end -> hgvs.location.Interval(start,end)) | (g_pos:start -> hgvs.location.Interval(start,copy.deepcopy(start)))
def_m_interval = (m_pos:start '_' m_pos:end -> hgvs.location.Interval(start,end)) | (m_pos:start -> hgvs.location.Interval(start,copy.deepcopy(start)))
def_p_interval = (p_pos:start '_' p_pos:end -> hgvs.location.Interval(start,end)) | (p_pos:start -> hgvs.location.Interval(start,copy.deepcopy(start)))
def_r_interval = (r_pos:start '_' r_pos:end -> hgvs.location.Interval(start,end)) | (r_pos:start -> hgvs.location.Interval(start,copy.deepcopy(start)))
def_g_interval = (g_pos:start '_' g_pos:end -> hgvs.location.Interval(start,end)) | (g_pos:start -> hgvs.location.Interval(start,None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like consistency across parallel ideas. It seems to me that we should either decided to deepcopy in the grammar or not, and do it only one way.

What led to changing from the current pattern? (I'm not saying that one is better than the other, but consistency is easier to understand.)

def_m_interval = (m_pos:start '_' m_pos:end -> hgvs.location.Interval(start,end)) | (m_pos:start -> hgvs.location.Interval(start,None))
def_p_interval = (p_pos:start '_' p_pos:end -> hgvs.location.Interval(start,end)) | (p_pos:start -> hgvs.location.Interval(start,None))
def_r_interval = (r_pos:start '_' r_pos:end -> hgvs.location.Interval(start,end)) | (r_pos:start -> hgvs.location.Interval(start,None))
def_c_interval = (c_pos:start '_' c_pos:end -> hgvs.location.BaseOffsetInterval(start,end)) | (c_pos:start -> hgvs.location.BaseOffsetInterval(start,copy.deepcopy(start)))
def_n_interval = (n_pos:start '_' n_pos:end -> hgvs.location.BaseOffsetInterval(start,end)) | (n_pos:start -> hgvs.location.BaseOffsetInterval(start,copy.deepcopy(start)))

# indefinite ranges
uncertain_g_interval = '(' def_g_interval:ivl_start ')' '_' '(' def_g_interval:ivl_end ')' -> hgvs.location.Interval(start=ivl_start._set_uncertain(), end=ivl_end._set_uncertain())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: "iv" is used elsewhere; please keep for consistency

| def_g_interval:ivl_start '_' '(' def_g_interval:ivl_end ')' -> hgvs.location.Interval(start=ivl_start, end=ivl_end._set_uncertain())
| '(' def_g_interval:ivl_start ')' '_' def_g_interval:ivl_end -> hgvs.location.Interval(start=ivl_start._set_uncertain(), end=ivl_end)


# positions
c_pos = def_c_pos #| '(' def_c_pos:pos ')' -> pos._set_uncertain()
g_pos = def_g_pos #| '(' def_g_pos:pos ')' -> pos._set_uncertain()
Expand Down
48 changes: 38 additions & 10 deletions src/hgvs/alignmentmapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
#


from __future__ import absolute_import, division, print_function, unicode_literals

from typing import Optional

from bioutils.coordinates import strand_int_to_pm
from six.moves import range

Expand All @@ -39,6 +43,7 @@
HGVSInvalidIntervalError,
HGVSUsageError,
)
from hgvs.location import Interval, BaseOffsetInterval
from hgvs.utils import build_tx_cigar
from hgvs.utils.cigarmapper import CIGARMapper

Expand Down Expand Up @@ -151,16 +156,29 @@ def __str__(self):
)
)

def g_to_n(self, g_interval, strict_bounds=None):
def g_to_n(self, g_interval: Interval, strict_bounds:Optional[bool]=None)->BaseOffsetInterval:
"""convert a genomic (g.) interval to a transcript cDNA (n.) interval"""

if strict_bounds is None:
strict_bounds = global_config.mapping.strict_bounds

grs, gre = (
g_interval.start.base - 1 - self.gc_offset,
g_interval.end.base - 1 - self.gc_offset,
)
# in case of uncertain ranges, we fall back to the inner (more confident) interval
if g_interval.start.uncertain:
grs = g_interval.start.end.base - 1 - self.gc_offset
else:
if isinstance(g_interval.start, Interval):
grs = g_interval.start.start.base - 1 - self.gc_offset
else:
grs = g_interval.start.base - 1 - self.gc_offset

if g_interval.end.uncertain:
gre = g_interval.end.start.base - 1 - self.gc_offset
else:
if isinstance(g_interval.end, Interval):
gre = g_interval.end.end.base - 1 - self.gc_offset
else:
gre = g_interval.end.base - 1 - self.gc_offset

# frs, fre = (f)orward (r)na (s)tart & (e)nd; forward w.r.t. genome
frs, frs_offset, frs_cigar = self.cigarmapper.map_ref_to_tgt(
pos=grs, end="start", strict_bounds=strict_bounds
Expand All @@ -174,17 +192,24 @@ def g_to_n(self, g_interval, strict_bounds=None):
frs_offset, fre_offset = -fre_offset, -frs_offset

# The returned interval would be uncertain when locating at alignment gaps
# of if the initial interval was uncertain
return hgvs.location.BaseOffsetInterval(
start=hgvs.location.BaseOffsetPosition(
base=_zbc_to_hgvs(frs), offset=frs_offset, datum=Datum.SEQ_START
base=_zbc_to_hgvs(frs),
offset=frs_offset,
datum=Datum.SEQ_START,
uncertain=g_interval.start.uncertain
),
end=hgvs.location.BaseOffsetPosition(
base=_zbc_to_hgvs(fre), offset=fre_offset, datum=Datum.SEQ_START
base=_zbc_to_hgvs(fre),
offset=fre_offset,
datum=Datum.SEQ_START,
uncertain=g_interval.end.uncertain
),
uncertain=frs_cigar in "DI" or fre_cigar in "DI",
)

def n_to_g(self, n_interval, strict_bounds=None):
def n_to_g(self, n_interval, strict_bounds=None) ->Interval:
"""convert a transcript (n.) interval to a genomic (g.) interval"""

if strict_bounds is None:
Expand Down Expand Up @@ -216,7 +241,7 @@ def n_to_g(self, n_interval, strict_bounds=None):
uncertain=grs_cigar in "DI" or gre_cigar in "DI",
)

def n_to_c(self, n_interval, strict_bounds=None):
def n_to_c(self, n_interval:Interval, strict_bounds:Optional[bool]=None):
"""convert a transcript cDNA (n.) interval to a transcript CDS (c.) interval"""

if strict_bounds is None:
Expand Down Expand Up @@ -246,7 +271,10 @@ def pos_n_to_c(pos):
else:
c = pos.base - self.cds_end_i
c_datum = Datum.CDS_END
return hgvs.location.BaseOffsetPosition(base=c, offset=pos.offset, datum=c_datum)
return hgvs.location.BaseOffsetPosition(base=c,
offset=pos.offset,
datum=c_datum,
uncertain=pos.uncertain)

c_interval = hgvs.location.BaseOffsetInterval(
start=pos_n_to_c(n_interval.start),
Expand Down
Loading