Add function for calculating niches #831

LLehner · 2024-05-27T16:23:44Z

Description

Adds a function that calculates niches using different strategies. The initial function calculates niches based on neighborhood profiles similar to here.

This PR will get updated with methods discussed in #789.

for more information, see https://pre-commit.ci

…into niche_definitions

…entile) counts

for more information, see https://pre-commit.ci

…into niche_definitions

for more information, see https://pre-commit.ci

… clustering steps

LLehner · 2024-10-10T12:45:05Z

@giovp @timtreis PR is ready for review!

Some additional questions that came up, which you could have a look at:

How should multiple slides be dealt with? All these methods should work with multiple slides, but i'm still not sure how the adjacency matrix changes when you run sq.gr.spatial_neighbors() on data with multiple slides. Is it a block diagonal matrix where each block on the diagonal is the adjacency matrix for a single slide? If yes, does it suffice to calculate the neighborhood graph once for all data or should it be done on individual slides? I noticed this can change niche results.
I think some parts could be sped up by parallelization (e.g. clustering with multiple resolutions), what would you recommend there?
If you run sq.calculate_niche() more than one time with flavor="neighborhood" i noticed that the first function call takes as much time as you would expect but for subsequent runs it's much (~10x) faster (e.g. first you run the method with one cluster resolution then call the method again with some other cluster resolutions). Its almost like the following function runs don't do neighborhood calculations (referring to sc.pp.neighbors here) anymore, but that shouldn't be the case, as nothing is cached and calculations happen on a new AnnData object for every function call. Also the "issue" persists even if you change the count matrix shape (by e.g. masking data).
Would you include some form of logging or verbosity such that user sees what the method is currently doing? Depending on the data and settings, it the function can take a while.

timtreis · 2024-10-15T07:06:28Z

Hey @LLehner

How should multiple slides be dealt with? All these methods should work with multiple slides, but i'm still not sure how the adjacency matrix changes when you run sq.gr.spatial_neighbors() on data with multiple slides. Is it a block diagonal matrix where each block on the diagonal is the adjacency matrix for a single slide? If yes, does it suffice to calculate the neighborhood graph once for all data or should it be done on individual slides? I noticed this can change niche results.

You mean a scenario where the user is just storing multiple slides in the same sdata object, like the point8, point16, ´point24` dataset we have? These should be fully independent since they have no biological connection.

I think some parts could be sped up by parallelization (e.g. clustering with multiple resolutions), what would you recommend there?

Do you have some rough numbers? We could give the user the option to define n_cores or sth and then use the parallel processing we already have, but we should definitely keep the option to run single-threaded so that it can be used inside tools like snakemake that take care of job distribution across cores. Otherwise, that'd cause errors the user cannot circumvent.

If you run sq.calculate_niche() more than one time with flavor="neighborhood" i noticed that the first function call takes as much time as you would expect but for subsequent runs it's much (~10x) faster (e.g. first you run the method with one cluster resolution then call the method again with some other cluster resolutions). Its almost like the following function runs don't do neighborhood calculations (referring to sc.pp.neighbors here) anymore, but that shouldn't be the case, as nothing is cached and calculations happen on a new AnnData object for every function call. Also the "issue" persists even if you change the count matrix shape (by e.g. masking data).

Could it be that the OS has some of the compiled bytecode or data in cache? 🤔

Would you include some form of logging or verbosity such that user sees what the method is currently doing? Depending on the data and settings, it the function can take a while.

What runtime are we speaking here about? I tend to be a fan of some intermediate output (if one can also shut it up, f.e. with some verbosity level)

giovp

that looks great @LLehner , I think it would benefit from a refactoring to take out the single implementation, validate arguments, and then run the functions. We discussed about potentially even doing separate modules, e.g.

squidpy.gr.niches.utag
...

but I'm not sure it is necessary, what do you and @timtreis think?
Keeping it like this would be also fine, but then the arguments should be handled a bit better (e.g. check whether the passed arguments are fit for the single implementation/flavour required, and otherwise fail right away). Also how each argument maps to each implementation could be documented in the docs.

Also some work to be done in the docs more generally.

src/squidpy/gr/_niche.py

giovp · 2024-10-28T03:05:03Z

src/squidpy/gr/_niche.py

+        Restrict niche calculation to a subset of the data.
+    table_key
+        Key in `spatialdata.tables` to specify an 'anndata' table. Only necessary if 'sdata' is passed.
+    mask


what's the use case for this?

If you want to make use of all existing connectivities but still want to ignore some cells during niche assignment. It's based on the 'mask' parameter here: https://monkeybread.readthedocs.io/en/latest/generated/monkeybread.calc.cellular_niches.html#monkeybread.calc.cellular_niches. The difference to subsetting is: if you subset you also cut the connectivities. With a mask you keep the connectivities but avoid labeling the masked cells.

src/squidpy/gr/_niche.py

giovp · 2024-10-28T03:22:43Z

also on this question

How should multiple slides be dealt with? All these methods should work with multiple slides, but i'm still not sure how the adjacency matrix changes when you run sq.gr.spatial_neighbors() on data with multiple slides. Is it a block diagonal matrix where each block on the diagonal is the adjacency matrix for a single slide? If yes, does it suffice to calculate the neighborhood graph once for all data or should it be done on individual slides? I noticed this can change niche results.

yes, and yes. If neighbor calculation is run with multiple slides, then a block diagonal matrix of the spatial neighbor is returned, in fact treating the slides to be independent.

I think some parts could be sped up by parallelization (e.g. clustering with multiple resolutions), what would you recommend there?

Do you have some rough numbers? We could give the user the option to define n_cores or sth and then use the parallel processing we already have, but we should definitely keep the option to run single-threaded so that it can be used inside tools like snakemake that take care of job distribution across cores. Otherwise, that'd cause errors the user cannot circumvent.

yes agree! You can take a look at the Parallel module we have now and just reuse it

Would you include some form of logging or verbosity such that user sees what the method is currently doing? Depending on the data and settings, it the function can take a while.

absolutely please log everything you see fit!

…unction name

for more information, see https://pre-commit.ci

Add function for calculating niches

ca6e7ff

LLehner added the graph 🕸️ label May 27, 2024

LLehner marked this pull request as draft May 27, 2024 16:24

pre-commit-ci bot and others added 4 commits May 27, 2024 16:25

[pre-commit.ci] auto fixes from pre-commit.com hooks

e08189a

for more information, see https://pre-commit.ci

Fix pre-commit

1b492ea

Fix pre-commit

032ef90

Update __init__.py

139819a

LLehner requested a review from timtreis May 27, 2024 16:43

LLehner and others added 3 commits May 27, 2024 18:54

Merge branch 'main' into niche_definitions

e0beead

Add function

a5b810e

[pre-commit.ci] auto fixes from pre-commit.com hooks

50a8474

for more information, see https://pre-commit.ci

timtreis added the squidpy2.0 Everything releated to a Squidpy 2.0 release label May 29, 2024

LLehner and others added 12 commits June 8, 2024 23:22

Update

38c67fb

Merge branch 'niche_definitions' of https://github.com/scverse/squidpy …

2eb450c

…into niche_definitions

adding fide score and jsd metrics

86d5efd

Add function to test for niche similarity by comparing max (99th perc…

334b7fb

…entile) counts

[pre-commit.ci] auto fixes from pre-commit.com hooks

54936f9

for more information, see https://pre-commit.ci

Fix result dataframe

2c5cac8

Merge branch 'niche_definitions' of https://github.com/scverse/squidpy …

6d74f78

…into niche_definitions

Add scores to compare different niche calculations

b5cb056

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b5ef61

for more information, see https://pre-commit.ci

Update doc string and param names

c98ec1b

Update doc string and param names

9813cf5

[pre-commit.ci] auto fixes from pre-commit.com hooks

bb3bdfb

for more information, see https://pre-commit.ci

This was referenced Jun 12, 2024

identifying multi-cellular niches #607

Closed

clustering accounting for spatial coordinates #13

Open

LLehner added 4 commits June 17, 2024 15:06

Update neighborhood profile, Remove utag import

ebdf1d5

Fix pre-commit

c6d020b

Add utag inner product step

9fa0157

Fix output; Remove subsetting, neighborhood options, dimreduction and…

49b51ca

… clustering steps

LLehner marked this pull request as ready for review October 10, 2024 12:44

LLehner requested a review from giovp October 10, 2024 12:44

Merge branch 'main' into niche_definitions

0392fc1

giovp requested changes Oct 28, 2024

View reviewed changes

LLehner and others added 16 commits November 12, 2024 13:34

Merge branch 'main' into niche_definitions

d4aea2c

Fix sepal

33436bf

Merge branch 'main' into niche_definitions

37af73d

Add suggested changes from review

ca753a2

Add distance option to neighborhood approach

e1b2814

Fix docs; Fix relative frequency calculation; Update arg validation f…

91895cd

…unction name

Fix tests

e771218

Fix tests

48d4957

Update tests

0c691a6

Fix tests

c7ef905

Fix tests

87cf01a

Fix tests

1a3361b

Fix tests

192f146

Fix tests

1b41546

[pre-commit.ci] auto fixes from pre-commit.com hooks

6dcf561

for more information, see https://pre-commit.ci

Merge branch 'main' into niche_definitions

2fc6624

LLehner requested a review from giovp December 9, 2024 17:46

timtreis and others added 6 commits January 17, 2025 19:25

Merge branch 'main' into niche_definitions

d503dda

Merge branch 'main' into niche_definitions

2ef56b1

Merge branch 'main' into niche_definitions

c558a5f

Update cellcharter approach

5bcfd11

Fix cellcharter approach

00756d6

[pre-commit.ci] auto fixes from pre-commit.com hooks

0760f91

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function for calculating niches #831

Add function for calculating niches #831

LLehner commented May 27, 2024

LLehner commented Oct 10, 2024

timtreis commented Oct 15, 2024

giovp left a comment

giovp Oct 28, 2024

LLehner Nov 30, 2024

giovp commented Oct 28, 2024

Add function for calculating niches #831

Are you sure you want to change the base?

Add function for calculating niches #831

Conversation

LLehner commented May 27, 2024

Description

LLehner commented Oct 10, 2024

timtreis commented Oct 15, 2024

giovp left a comment

Choose a reason for hiding this comment

giovp Oct 28, 2024

Choose a reason for hiding this comment

LLehner Nov 30, 2024

Choose a reason for hiding this comment

giovp commented Oct 28, 2024