-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function for calculating niches #831
base: main
Are you sure you want to change the base?
Conversation
…into niche_definitions
for more information, see https://pre-commit.ci
…into niche_definitions
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…into niche_definitions
for more information, see https://pre-commit.ci
@giovp @timtreis PR is ready for review! Some additional questions that came up, which you could have a look at:
|
Hey @LLehner
You mean a scenario where the user is just storing multiple slides in the same sdata object, like the
Do you have some rough numbers? We could give the user the option to define
Could it be that the OS has some of the compiled bytecode or data in cache? 🤔
What runtime are we speaking here about? I tend to be a fan of some intermediate output (if one can also shut it up, f.e. with some verbosity level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks great @LLehner , I think it would benefit from a refactoring to take out the single implementation, validate arguments, and then run the functions. We discussed about potentially even doing separate modules, e.g.
squidpy.gr.niches.utag
...
but I'm not sure it is necessary, what do you and @timtreis think?
Keeping it like this would be also fine, but then the arguments should be handled a bit better (e.g. check whether the passed arguments are fit for the single implementation/flavour required, and otherwise fail right away). Also how each argument maps to each implementation could be documented in the docs.
Also some work to be done in the docs more generally.
) -> AnnData | pd.DataFrame: | ||
"""Calculate niches (spatial clusters) based on a user-defined method in 'flavor'. | ||
The resulting niche labels with be stored in 'adata.obs'. If flavor = 'all' then all available methods | ||
will be applied and additionally compared using cluster validation scores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space
%(adata)s | ||
flavor | ||
Method to use for niche calculation. Available options are: | ||
- `{c.NEIGHBORHOOD.s!r}` - cluster the neighborhood profile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be defined in constants and added with inject docs, see other docs on how to do it
- `{c.CELLCHARTER.s!r}` - cluster adjacency matrix with Gaussian Mixture Model (GMM) using CellCharter's approach. | ||
- `{c.SPOT.s!r}` - calculate niches using optimal transport. (coming soon) | ||
- `{c.BANKSY.s!r}`- use Banksy algorithm. (coming soon) | ||
%(library_key)s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won' twork without injecting docs, see the read the docs build to see how it renders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the function is not added in the docstrings so it is not rendering it at allk
Restrict niche calculation to a subset of the data. | ||
table_key | ||
Key in `spatialdata.tables` to specify an 'anndata' table. Only necessary if 'sdata' is passed. | ||
mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the use case for this?
Required if flavor == 'neighborhood'. | ||
n_neighbors | ||
Number of neighbors to use for 'scanpy.pp.neighbors' before clustering using leiden algorithm. | ||
Required if flavor == 'neighborhood' or flavor == 'UTAG'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use the constants also here
if resolutions is not None: | ||
if not isinstance(resolutions, list): | ||
resolutions = [resolutions] | ||
else: | ||
raise ValueError("Please provide resolutions for leiden clustering.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be resolutions = [resolutions] if not isinstance(resolutions, list) else resolutions
but then you'd also have to validate the args for this specific niche flavour. Using spearate functions, these checks could be done before
|
||
|
||
def _utag(adata: AnnData, normalize_adj: bool, spatial_connectivity_key: str) -> AnnData: | ||
"""Performs inner product of adjacency matrix and feature matrix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs formatting, please do
"""
...
"""
return float(-(distribution * np.log(distribution + 1e-8)).sum()) | ||
|
||
|
||
def _iter_uid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add docstring to this function
Returns | ||
The Shannon entropy | ||
""" | ||
return float(-(distribution * np.log(distribution + 1e-8)).sum()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both numpy and scipy implement entropy, why is this required?
return float(np.pad(f1_scores, (0, n_classes - len(f1_scores))).mean()) | ||
|
||
|
||
def jensen_shannon_divergence(adatas: AnnData | list[AnnData], library_key: str, slide_key: str | None = None) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think scipy has implemention for this as well.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jensenshannon.html
also on this question
yes, and yes. If neighbor calculation is run with multiple slides, then a block diagonal matrix of the spatial neighbor is returned, in fact treating the slides to be independent.
yes agree! You can take a look at the
absolutely please log everything you see fit! |
Description
Adds a function that calculates niches using different strategies. The initial function calculates niches based on neighborhood profiles similar to here.
This PR will get updated with methods discussed in #789.