Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sankey diagram and transition matrix issue #795

Open
Enterprise-J opened this issue Feb 4, 2025 · 3 comments
Open

Sankey diagram and transition matrix issue #795

Enterprise-J opened this issue Feb 4, 2025 · 3 comments
Assignees

Comments

@Enterprise-J
Copy link

Enterprise-J commented Feb 4, 2025

1

I am running Sankey in Moscot but have a few issues:

tp.sankey(
    source=0,
    target=1,
    source_groups="cluster_simple",
    target_groups="cluster_simple",
)
mtp.sankey(tp, dpi=100, interpolate_color = True)

gives the following plot, however only Stem is supposed to exist in source distribution. No endocrine/nonendocrine related cell types should be in both distributions. From the transition matrix I can roughly see why but should the Sankey diagram also account for relative abundance (to reflect "evolution between time points")?

Image

The transition matrix is:

{'transition_matrices': [                Endocrine-prog  Endoderm   Foregut  Nonendocrine          Stem
  Endocrine-prog    3.856930e-04  0.067454  0.028325      0.001275  9.494407e-06
  Endoderm          5.453516e-04  0.446731  0.004342      0.026980  8.877856e-04
  Foregut           1.890343e-04  0.085968  0.022018      0.007978  2.774018e-08
  Nonendocrine      2.741671e-04  0.147414  0.001323      0.081231  1.508115e-03
  Stem              5.864883e-13  0.016001  0.000005      0.000540  5.861260e-02],
 'key': 'cluster_simple',
 'source': 0,
 'target': 1,
 'source_groups': 'cluster_simple',
 'target_groups': 'cluster_simple',
 'captions': ['(0, 1.0)']}

2

I also tried to use threshold=0.01 but this gives a erroneous plot and transition matrix:

Image

{'transition_matrices': [                Endocrine-prog  Endoderm  Foregut  Nonendocrine  Stem
  Endocrine-prog             NaN       NaN      NaN           NaN   NaN
  Endoderm                   NaN       NaN      NaN           NaN   NaN
  Foregut                    NaN       NaN      NaN           NaN   NaN
  Nonendocrine               NaN       NaN      NaN           NaN   NaN
  Stem                       NaN       NaN      NaN           NaN   NaN],
 'key': 'cluster_simple',
 'source': 0,
 'target': 1,
 'source_groups': 'cluster_simple',
 'target_groups': 'cluster_simple',
 'captions': ['(0, 1.0)']}

Did I miss anything here?

Thanks,
Braxton

@selmanozleyen
Copy link
Collaborator

Hi @Enterprise-J I will have a look but in the meantime could you share an example where this happens in one of our built in datasets? So that there is a reproducible code that someone can run from scratch? If not its fine but it would speed up the process

@Enterprise-J
Copy link
Author

Enterprise-J commented Feb 5, 2025

Hi I think I know the cause of issue 1. The Sankey diagram is plotting with transition probability values directly. Since there are <1% percent of misclassified cells assigned with high transitional probabilities at time point 0, diagram is dominated by them and becomes counterintuitive.

For issue 2:

tp = TemporalProblem(adata) # adata contains ~0.9M cells from 7 time points
tp = tp.solve(epsilon=1e-3, tau_a=0.95, scale_cost="mean")
tp.sankey(
    source=0,
    target=1,
    source_groups="cluster_simple",
    target_groups="cluster_simple",
    threshold=0.01
)
mtp.sankey(tp, dpi=100, interpolate_color = True)

I am using moscot build 0.4.0 and anndata 0.11.2 in python 3.12.8. Sorry that I can't share the adata object because it is so large.

@selmanozleyen
Copy link
Collaborator

Hi @Enterprise-J for the first issue you are right. About the second one, this also happens to me but I noticed that the value 0.01 is too high in this context. I would suggest you to set a small value for your case. But this shows we should probably send an error when the value is too high that it breaks or handle it somehow better. So thanks for writing! Let me know if it becomes something more sensible when you lower the value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants