-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple snp_transcripts in plot_diplotype_clustering_advanced()
#703
Allow multiple snp_transcripts in plot_diplotype_clustering_advanced()
#703
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @jonbrenas. Couple of small comments. Could you also post a screenshot of this working?
malariagen_data/anopheles.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear why there are any changes to this file included in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnophelesDipClustAnalysis
needs to inherit gene_cnv
from AnophelesCnvFrequencyAnalysis
(see comment below) which created a loop in the inheritance tree. Because AnophelesDipClustAnalysis
is a subclass of AnophelesCnvFrequencyAnalysis
and AnophelesSnpFrequencyAnalysis
, when AnophelesDataResource
inherits AnophelesDipClustAnalysis
, it also inherits them so they don't need to be included in the list of inherited classes anymore.
…-to-diplotype_clustering
…_clustering' of github.com:malariagen/malariagen-data-python into 600-adding-option-for-multiple-transcripts-to-diplotype_clustering
awesome. |
…-to-diplotype_clustering
Thanks @jonbrenas . There seems to be some general disagreement around how to approach the issue of changing parameters and causing breaking changes. I'll try to formulate a diplomatic strategy in order to resolve my PR #691 , which might hopefully shed some light on agreeable way forward and also apply to this case, as a package-wide policy. |
@jonbrenas @alimanfoo I'm coming around to the idea that we should, as many other packages do, maintain support for deprecated parameters for a limited time (although I'm not sure how we'll decide when to drop support) but issue a This introduces the problem of kicking the can down the road, in terms of when to eventually drop support completely, but alleviates the problem of users' code suddenly not working without any warning or clue how to fix it, assuming they haven't read the upgrade notes or latest docs. This strategy would particularly apply to situations like this were we naturally want to change the name and nature of a parameter in order to suit some added or extended functionality. It would allow us the freedom to develop the user-facing interface (the public function signatures) towards clarity and semantic consistency without risking too much friction, allowing smoother and friendlier transitions towards API improvements in functionality without risking clarity. But I'm still not sure what the policy should be for how long we allow deprecated parameters to linger around and clutter up the interface and codebase. I suppose we could either have a time-based approach, where we completely drop features if they have been deprecated in this manner for over, say, 6 months. Or we could have a release-based approach, where we drop deprecated features after having supported them for, say, 2 unrelated version increments. I guess this creates a sort of tsunami of delayed feature drops and a big cull whenever we bite the bullet and actually cut a major release. We could inform users ahead of time when the feature would be dropped, e.g. in the deprecation warning and docs. Another approach might be to somehow monitor the usage of the old parameter, and finally drop it when usage falls below a certain percentage, in favour of the new parameter... but that sounds hard. I suppose we could also decide on some kind of mixed policy, e.g. whichever comes first: a new major release or 6 months of deprecation. Putting those details aside, I can certainly see how this new approach should (if adopted) reduce the number of major releases that would otherwise be made under a policy of semantic versioning and fearless development, e.g. whenever we slightly tweak the name of one parameter, as well as being generally more user-friendly. |
Personally think it should remain as snp_transcript entirely as the
canonical usage is to use with a single transcript
…On Wed, 5 Feb 2025, 08:29 Lee, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In malariagen_data/anoph/dipclust.py
<#703 (comment)>
:
> - snp_transcript: Optional[base_params.transcript] = None,
+ snp_transcripts: Sequence[base_params.transcript] = [],
Hi @jonbrenas <https://github.com/jonbrenas> . Since this issue is high
priority, to be pragmatic and diplomatic, I reckon we should go with
@alimanfoo <https://github.com/alimanfoo> 's suggestion in this
situation, but it would still be good to try to reach agreement on a
package-wide policy for consistency. We would still need to work out the
details for that (e.g. param deprecation strategy, perhaps via PR #691
<#691>) anyway,
which would probably delay this enhancement beyond justification. What do
you think? Personally I'm still not entirely comfortable with this, maybe
because I'm an idealist at heart... I suppose compromise can be as
difficult as naming things!
—
Reply to this email directly, view it on GitHub
<#703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIKN6HOX4FZFDAZTUBNFBI32OHDVLAVCNFSM6AAAAABTQCUWOCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOJUHE2TGMJUGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Well actually, I see what you mean with how we use 'sample_sets' for
example. Understandable to want consistency.
…On Wed, 5 Feb 2025, 08:36 Sanjay Nagi, ***@***.***> wrote:
Personally think it should remain as snp_transcript entirely as the
canonical usage is to use with a single transcript
On Wed, 5 Feb 2025, 08:29 Lee, ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In malariagen_data/anoph/dipclust.py
> <#703 (comment)>
> :
>
> > - snp_transcript: Optional[base_params.transcript] = None,
> + snp_transcripts: Sequence[base_params.transcript] = [],
>
> Hi @jonbrenas <https://github.com/jonbrenas> . Since this issue is high
> priority, to be pragmatic and diplomatic, I reckon we should go with
> @alimanfoo <https://github.com/alimanfoo> 's suggestion in this
> situation, but it would still be good to try to reach agreement on a
> package-wide policy for consistency. We would still need to work out the
> details for that (e.g. param deprecation strategy, perhaps via PR #691
> <#691>) anyway,
> which would probably delay this enhancement beyond justification. What do
> you think? Personally I'm still not entirely comfortable with this, maybe
> because I'm an idealist at heart... I suppose compromise can be as
> difficult as naming things!
>
> —
> Reply to this email directly, view it on GitHub
> <#703 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AIKN6HOX4FZFDAZTUBNFBI32OHDVLAVCNFSM6AAAAABTQCUWOCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOJUHE2TGMJUGY>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonbrenas . I reckon this is good to go, unless there's anything more you think needs doing here.
…-to-diplotype_clustering
Merging soon unless objections. |
Resolves #600.
@KellyLBennett and Nana Amoako needed this functionality earlier today so I sped it up a bit. There are currently no tests for
plot_diplotype_clustering_advanced()
so the best I can say is that it worked in the notebook. I will add tests when I have time.