Skip to content

Conversation

theferrit32
Copy link

No description provided.

@theferrit32 theferrit32 requested a review from a team September 20, 2023 16:29
@theferrit32 theferrit32 self-assigned this Sep 20, 2023
@theferrit32 theferrit32 requested a review from reece as a code owner September 20, 2023 16:29
@theferrit32
Copy link
Author

theferrit32 commented Sep 20, 2023

Haven't added for c. yet. The hgvs.pymeta changes are likely similar but will need to add tests.

Can use the expressions from here: #225

@korikuzma
Copy link
Contributor

@biocommons/hgvs-maintainers Would we be able to get feedback on this PR?

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Dec 10, 2023
@andreasprlic
Copy link
Member

andreasprlic commented Dec 11, 2023

It would be great to get this PR in. Can we just make sure the "uncertain" field on BaseOffsetPosition or SimplePosition gets set correctly (and tested in the unit test)?

If we express a range in parenthesis, I'd assume we want to express that the whole range is uncertain. At least that's how I'd read the hgvs spec.

Thanks!

andreasprlic
andreasprlic previously approved these changes Dec 11, 2023
Copy link
Member

@andreasprlic andreasprlic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andreasprlic andreasprlic removed the stale Issue is stale and subject to automatic closing label Dec 11, 2023
…Since the behavior is a bit unspecified, we fall back to the inner (confident) interval of the uncertain range for this projection.
@andreasprlic
Copy link
Member

Playing around with this branch some more I realized that g_to_c projection was not yet working. As such I had a go at enabling this. Since the behavior of this projection is somewhat undefined, the current approach picks the inner (=confident) interval of the uncertain range and uses that for the .c projection. The resulting hgvs_c string is then "confident" and not "uncertain" any longer.

Copy link

github-actions bot commented Feb 2, 2024

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Feb 2, 2024
Copy link

This PR was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this Feb 10, 2024
@andreasprlic andreasprlic reopened this Feb 10, 2024
@andreasprlic andreasprlic requested a review from a team as a code owner February 10, 2024 01:49
@andreasprlic
Copy link
Member

Can we work on this one on Monday during our dev session?

):
"""This is a unit test for the clinvar uncertain ranges described in
issue #225: https://github.com/biocommons/hgvs/issues/225.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reece here are the unit tests for #225

@reece
Copy link
Member

reece commented Apr 14, 2025

Note: some of the provided c. examples in #225 are most likely not valid hgvs: e.g. I think this is not valid: NM_000304.3:c.-34-?_*1140dup1657. In the current form of this PR, after g_to_c this becomes: NM_000304.3:c.(?_-34)_(*1140_?)dup.

Discussion re: representation and maintenance of precision during projection

  • Andreas: Retain precision of inner and outer boundaries in g_to_c and c_to_g; see example above

From #225, clinvar 251062

  • clinvar g : NC_000019.9:g.(11211022_11213339)_(11217364_11218067)dup
  • clinvar c : NM_000527.5:c.(190+1_191-1)_(817+1_818-1)dup

On current #699 branch:

>>> var_g1 = parse("NC_000019.9:g.(11211022_11213339)_(11217364_11218067)dup")
>>> var_g1
SequenceVariant(ac=NC_000019.9, type=g, posedit=(11211022_11213339)_(11217364_11218067)dup, gene=None)
>>> var_c1 = am38.g_to_c(var_g1, "NM_000527.5")
>>> var_c1
SequenceVariant(ac=NM_000527.5, type=c, posedit=(?_191-1)_(817+1_?)dup, gene=None)
>>> var_g2 = am38.c_to_g(var_c1)
Traceback (most recent call last):
  Cell In[16], line 1
    var_g2 = am38.c_to_g(var_c1)
  File ~/projects/biocommons/hgvs/src/hgvs/assemblymapper.py:131 in c_to_g
    return self._maybe_normalize(var_out)
  File ~/projects/biocommons/hgvs/src/hgvs/assemblymapper.py:265 in _maybe_normalize
    return self._norm.normalize(var)
  File ~/projects/biocommons/hgvs/src/hgvs/normalizer.py:119 in normalize
    if var.posedit.pos.start.base < 0 or not is_valid_pos(var.ac, var.posedit.pos.end.base):
AttributeError: 'Interval' object has no attribute 'base'

Taking var_c1 above, new additional tests in tests/test_hgvs_sequencevariant.py (#699):

  • hgvs g: NC_000019.9:g.(?11213339)(11217364_?)dup
  • hgvs c: NM_000527.5:c.(?191-1)(817+1_?)dup

While these are good tests, they are not testing the same thing as the clinvar variants. We should generally try for round-trip variant projection and expect to explain cases where this isn't true.

Grammar changes

  • Kyle: Address comments re: grammar in PR
  • Reece: find or recreate table of grammar rules and classes [found! here]
  • Kyle & Reece: collaborate on grammar and class changes to support interval of intervals
Clinvar genomic  NC_000017.10:g.(?_15133094)_(15164078_?)dup
hgvs:            NM_000304.3                 :c. (?_-34)_(*1140_?)  dup
ClinVar has: NM_000304.3(PMP22)  :c. -34-? _ *1140        dup1657

Kinds of locations

                          (190,None)
                        (190+1,None)
         (190+1       _        191-1)
(190+1 _ 191-1) _ (817+1 _ 818-1)
>>> v = parse("NM_000304.3(PMP22):c.-34-?_*1140dup1657")
HGVSParseError: NM_000304.3(PMP22):c.-34-?_*1140dup1657: char 25: expected a digit

-34-? is not be parsed by hgvs currently because offset = snum | (-> 0) (hgvs.pymeta:202)

Other Action Items:

  • Reece: Reconcile main w/current branch
  • Reschedule maintainers' meeting to discuss
  • Separate issue: Add support for ? in offset. Done. See Support uncertain BaseOffset positions #770
  • Reece: raise issue re: use of dup in HGVS v. real-world use by LDs.

@reece reece marked this pull request as draft April 14, 2025 17:10
@reece
Copy link
Member

reece commented Apr 15, 2025

@andreasprlic @theferrit32 I did a bunch of surgery on project config diffs between main and this branch. They're now in sync. Tests pass, including on github (i.e., the cache is intact)

Please destroy your venv and rebuild with make devready to test.

@davmlaw
Copy link
Contributor

davmlaw commented Sep 2, 2025

It would be good if you could summarise the current state of things and what needs to be done.

From my quick skim - I'm worried about the silent conversion between types eg g_to_n. The HGVS spec doesn't list how conversions should be done, so we are on our own.

In g_to_n - comment is "in case of uncertain ranges, we fall back to the inner (more confident) interval"

This is a valid choice, but I argue it's not the only one.

I would argue:

  • We can't convert things silently - the only possible default is to return "p.?" or "c.?" (unless we calculate all possibilities?)
  • We should offer a SequenceVariant convenience method to return a new SequenceVariant object representing the certain inner interval, so clients can call g_to_n with that explicitly
  • We could offer an option in mapper to do this (eg uncertain_conversion='inner' but better name than I can think of now) which would do your current behavior

Copy link

gitguardian bot commented Sep 3, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
15770347 Triggered PostgreSQL Credentials a1a7ece misc/docker-compose.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants