Allow multiple SNP transcripts in `plot_diplotype_clustering_advanced()` #703

jonbrenas · 2024-12-12T15:43:53Z

Resolves #600.

@KellyLBennett and Nana Amoako needed this functionality earlier today so I sped it up a bit. There are currently no tests for plot_diplotype_clustering_advanced() so the best I can say is that it worked in the notebook. I will add tests when I have time.

review-notebook-app · 2024-12-12T15:43:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

alimanfoo

Thanks so much @jonbrenas. Couple of small comments. Could you also post a screenshot of this working?

malariagen_data/anoph/dipclust.py

alimanfoo · 2024-12-13T17:35:56Z

malariagen_data/anopheles.py

Not clear why there are any changes to this file included in this PR?

AnophelesDipClustAnalysis needs to inherit gene_cnv from AnophelesCnvFrequencyAnalysis (see comment below) which created a loop in the inheritance tree. Because AnophelesDipClustAnalysis is a subclass of AnophelesCnvFrequencyAnalysis and AnophelesSnpFrequencyAnalysis, when AnophelesDataResource inherits AnophelesDipClustAnalysis, it also inherits them so they don't need to be included in the list of inherited classes anymore.

malariagen_data/anoph/dipclust.py

jonbrenas · 2024-12-13T18:04:38Z

Here is a picture (the chosen genes were Cyp6aa1 and Cyp6p15p):

We might want to add titles to make things clearer.

…-to-diplotype_clustering

…_clustering' of github.com:malariagen/malariagen-data-python into 600-adding-option-for-multiple-transcripts-to-diplotype_clustering

sanjaynagi · 2024-12-16T10:19:42Z

awesome.

…-to-diplotype_clustering

leehart · 2025-01-31T15:11:25Z

Thanks @jonbrenas . There seems to be some general disagreement around how to approach the issue of changing parameters and causing breaking changes. I'll try to formulate a diplomatic strategy in order to resolve my PR #691 , which might hopefully shed some light on agreeable way forward and also apply to this case, as a package-wide policy.

leehart · 2025-01-31T16:50:20Z

@jonbrenas @alimanfoo I'm coming around to the idea that we should, as many other packages do, maintain support for deprecated parameters for a limited time (although I'm not sure how we'll decide when to drop support) but issue a DeprecationWarning preparing them for the upgrade path.

This introduces the problem of kicking the can down the road, in terms of when to eventually drop support completely, but alleviates the problem of users' code suddenly not working without any warning or clue how to fix it, assuming they haven't read the upgrade notes or latest docs.

This strategy would particularly apply to situations like this were we naturally want to change the name and nature of a parameter in order to suit some added or extended functionality. It would allow us the freedom to develop the user-facing interface (the public function signatures) towards clarity and semantic consistency without risking too much friction, allowing smoother and friendlier transitions towards API improvements in functionality without risking clarity.

But I'm still not sure what the policy should be for how long we allow deprecated parameters to linger around and clutter up the interface and codebase. I suppose we could either have a time-based approach, where we completely drop features if they have been deprecated in this manner for over, say, 6 months. Or we could have a release-based approach, where we drop deprecated features after having supported them for, say, 2 unrelated version increments. I guess this creates a sort of tsunami of delayed feature drops and a big cull whenever we bite the bullet and actually cut a major release. We could inform users ahead of time when the feature would be dropped, e.g. in the deprecation warning and docs. Another approach might be to somehow monitor the usage of the old parameter, and finally drop it when usage falls below a certain percentage, in favour of the new parameter... but that sounds hard. I suppose we could also decide on some kind of mixed policy, e.g. whichever comes first: a new major release or 6 months of deprecation.

Putting those details aside, I can certainly see how this new approach should (if adopted) reduce the number of major releases that would otherwise be made under a policy of semantic versioning and fearless development, e.g. whenever we slightly tweak the name of one parameter, as well as being generally more user-friendly.

sanjaynagi · 2025-02-05T08:37:05Z

Personally think it should remain as snp_transcript entirely as the canonical usage is to use with a single transcript

…

On Wed, 5 Feb 2025, 08:29 Lee, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In malariagen_data/anoph/dipclust.py <#703 (comment)> : > - snp_transcript: Optional[base_params.transcript] = None, + snp_transcripts: Sequence[base_params.transcript] = [], Hi @jonbrenas <https://github.com/jonbrenas> . Since this issue is high priority, to be pragmatic and diplomatic, I reckon we should go with @alimanfoo <https://github.com/alimanfoo> 's suggestion in this situation, but it would still be good to try to reach agreement on a package-wide policy for consistency. We would still need to work out the details for that (e.g. param deprecation strategy, perhaps via PR #691 <#691>) anyway, which would probably delay this enhancement beyond justification. What do you think? Personally I'm still not entirely comfortable with this, maybe because I'm an idealist at heart... I suppose compromise can be as difficult as naming things! — Reply to this email directly, view it on GitHub <#703 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIKN6HOX4FZFDAZTUBNFBI32OHDVLAVCNFSM6AAAAABTQCUWOCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOJUHE2TGMJUGY> . You are receiving this because you commented.Message ID: ***@***.***>

sanjaynagi · 2025-02-05T08:38:46Z

Well actually, I see what you mean with how we use 'sample_sets' for example. Understandable to want consistency.

…

On Wed, 5 Feb 2025, 08:36 Sanjay Nagi, ***@***.***> wrote: Personally think it should remain as snp_transcript entirely as the canonical usage is to use with a single transcript On Wed, 5 Feb 2025, 08:29 Lee, ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In malariagen_data/anoph/dipclust.py > <#703 (comment)> > : > > > - snp_transcript: Optional[base_params.transcript] = None, > + snp_transcripts: Sequence[base_params.transcript] = [], > > Hi @jonbrenas <https://github.com/jonbrenas> . Since this issue is high > priority, to be pragmatic and diplomatic, I reckon we should go with > @alimanfoo <https://github.com/alimanfoo> 's suggestion in this > situation, but it would still be good to try to reach agreement on a > package-wide policy for consistency. We would still need to work out the > details for that (e.g. param deprecation strategy, perhaps via PR #691 > <#691>) anyway, > which would probably delay this enhancement beyond justification. What do > you think? Personally I'm still not entirely comfortable with this, maybe > because I'm an idealist at heart... I suppose compromise can be as > difficult as naming things! > > — > Reply to this email directly, view it on GitHub > <#703 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AIKN6HOX4FZFDAZTUBNFBI32OHDVLAVCNFSM6AAAAABTQCUWOCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOJUHE2TGMJUGY> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

leehart

Thanks @jonbrenas . I reckon this is good to go, unless there's anything more you think needs doing here.

…-to-diplotype_clustering

leehart · 2025-02-05T12:34:28Z

Merging soon unless objections.

Updated the notebook

34c24c8

jonbrenas self-assigned this Dec 12, 2024

jonbrenas marked this pull request as draft December 12, 2024 15:48

jonbrenas added 6 commits December 12, 2024 15:50

Better typing

75b4cc3

More tests and some trimming of the hierarchy in anoph

6b9767d

Testing something

a2c4275

Better typing, hopefully

b74a0a8

Removed the heterozygosity blcok

832ba29

Another erroneous type

911010b

jonbrenas marked this pull request as ready for review December 13, 2024 13:02

alimanfoo reviewed Dec 13, 2024

View reviewed changes

jonbrenas added 3 commits December 13, 2024 18:22

Merge branch 'master' into 600-adding-option-for-multiple-transcripts…

fbb1a39

…-to-diplotype_clustering

Made snp_transcript polymorphic to avoid breaking the API

7a8aaf6

Merge branch '600-adding-option-for-multiple-transcripts-to-diplotype…

d57e8bc

…_clustering' of github.com:malariagen/malariagen-data-python into 600-adding-option-for-multiple-transcripts-to-diplotype_clustering

jonbrenas marked this pull request as draft December 13, 2024 19:05

Better typing

4d0fac4

jonbrenas marked this pull request as ready for review December 13, 2024 19:30

Merge branch 'master' into 600-adding-option-for-multiple-transcripts…

7104cdb

…-to-diplotype_clustering

leehart added the high priority label Jan 24, 2025

leehart requested review from alimanfoo and leehart and removed request for alimanfoo January 24, 2025 11:25

This was referenced Jan 31, 2025

cnv_discordant_read_calls has a contig param but supports contigs #674

Open

Add more tests for diplotype clustering #715

Closed

leehart approved these changes Feb 5, 2025

View reviewed changes

Merge branch 'master' into 600-adding-option-for-multiple-transcripts…

450235b

…-to-diplotype_clustering

leehart merged commit 0197957 into master Feb 6, 2025
10 checks passed

leehart deleted the 600-adding-option-for-multiple-transcripts-to-diplotype_clustering branch February 6, 2025 09:29

leehart changed the title ~~Allow multiple snp_transcripts in plot_diplotype_clustering_advanced()~~ Allow multiple SNP transcripts in plot_diplotype_clustering_advanced() Mar 3, 2025

Allow multiple SNP transcripts in plot_diplotype_clustering_advanced() #703

Allow multiple SNP transcripts in plot_diplotype_clustering_advanced() #703

Uh oh!

Conversation

jonbrenas commented Dec 12, 2024 • edited by alimanfoo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Dec 12, 2024

Uh oh!

alimanfoo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alimanfoo Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

jonbrenas Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonbrenas commented Dec 13, 2024

Uh oh!

sanjaynagi commented Dec 16, 2024

Uh oh!

leehart commented Jan 31, 2025

Uh oh!

leehart commented Jan 31, 2025

Uh oh!

sanjaynagi commented Feb 5, 2025 via email

Uh oh!

sanjaynagi commented Feb 5, 2025 via email

Uh oh!

leehart left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leehart commented Feb 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Allow multiple SNP transcripts in `plot_diplotype_clustering_advanced()` #703

Allow multiple SNP transcripts in `plot_diplotype_clustering_advanced()` #703

jonbrenas commented Dec 12, 2024 •

edited by alimanfoo

Loading

jonbrenas Dec 13, 2024 •

edited

Loading

leehart left a comment •

edited

Loading