Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UMAP implementation in igraph #3408

Open
iosonofabio opened this issue Dec 18, 2024 · 3 comments
Open

UMAP implementation in igraph #3408

iosonofabio opened this issue Dec 18, 2024 · 3 comments
Assignees

Comments

@iosonofabio
Copy link

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

dear devs,

Thank you for maintaining scanpy, which is a great piece of software.

I am a core developer of igraph and around a year ago we started providing an implementation of UMAP. It is accessible via the igraph Python package and the actual computations run entirely in C. This is designed to be an alternative to the original UMAP implementation that requires numba, llvmlite, etc. Since scanpy relies on igraph for clustering already, this might be useful for folks who have trouble with the dependency stack.

Just like Leiden clustering can now use either leidenalg or directly igraph, it would be nice to let people choose flavor='igraph' for embedding as well.

Our implementation requires as input a nearest neighbor graph with distances and has two parts, just like the original umap:

  1. a function to compute what are now called in scanpy "connectivities" from the distances. This is among other things a symmetrysation of the knn graph.
  2. a function to compute the embedding from the connectivity graph.

Would you be interested in this? If so, I can make a PR for scanpy and perhaps one of you can review it?

@iosonofabio iosonofabio added the Triage 🩺 This issue needs to be triaged by a maintainer label Dec 18, 2024
@ricor07
Copy link

ricor07 commented Dec 26, 2024

Hello, I'd be glad to provide this enhancement and joining the project. Can you assign this issue to me? Thanks

@flying-sheep flying-sheep removed the Triage 🩺 This issue needs to be triaged by a maintainer label Jan 9, 2025
@flying-sheep
Copy link
Member

flying-sheep commented Jan 9, 2025

Sure, I assigned you for now! Please note

  1. We have a contribution guide

  2. A similar PR exists that you can use as orientation: (feat): igraph leiden implementation now included as an option in sc.tl.leiden #2815

    Specifically, please add flavor='igraph' as a non-default option.

Regarding design, there are two considerations you can engage with if you want:

  • It’s worth it to investigate how to best include the two-step process around connectivities @iosonofabio mentioned. One way would be to have a kind of sklearn-like “transformer” that creates the connectivities on .fit() and the UMAP on .transform() if that makes sense.
  • Another consideration is Compute UMAP from pre-existing distance matrix #2157: Computing the umap from an existing distance matrix instead of raw data. If you end up going for the two-step solution, incorporating this would also make sense.

If you want to have a call with us about how to best design an API around all that (or about which parts you want to leave out), please tell us. If you already have a straightforward idea about what to do, please go ahead!

@iosonofabio
Copy link
Author

Thanks both. Note that scanpy already has the two step process in place internally, and you could swap igraph for either or both steps.

Actually it's three steps, the first one being the creation of the knn graph. Scanpy already resected that chunk out of the official UMAP implementation for performance reasons.

Let me know if I can help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants