Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selective read/write mudata modalities #63

Open
racng opened this issue Dec 7, 2023 · 1 comment
Open

Selective read/write mudata modalities #63

racng opened this issue Dec 7, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@racng
Copy link

racng commented Dec 7, 2023

Is your feature request related to a problem? Please describe.
Reading and writing MuData is a bit slow sometimes. For example, after doing some TCR sequence analyses the MuData takes longer to read/write. Sometimes I added one annotation to mdata.obs but then it requires writing all modalities when saving. I appreciate that there is the ability to read and write one specific modality specified like mdata.h5mu/rna but there is no option to read and write only non-modality related elements like mdata.obs, mdata.var, mdata.obsm, etc. I imagine it could save time in different use cases.

Describe the solution you'd like
Ability to specify list of modalities to read/write, with the option to give an empty list such that only mdata non-modality related elements are read/written. This could be implemented by an extra argument in existing MuData IO functions.

@racng racng added the enhancement New feature or request label Dec 7, 2023
@gtca
Copy link
Collaborator

gtca commented Jul 2, 2024

Thank you, @racng, for the detailed use case description!

Ideally we would stay close to the anndata's implementation of the backed mode but the interface for what you describe was scrapped there.

Just as in anndata, there's currently a backed mode in mudata that might help:

mdata = mudata.read("dataset.h5mu", backed=True)

I can also link related issues that discuss similar challenges in AnnData:

The last one showcases some ongoing work to make the API to read elements public but it's still work in progress.
I am also not sure if writing data back on disk is part of that effort.

There's another experimental approach to handle out-of-memory operations with AnnData/MuData objects that you can try — https://github.com/scverse/shadows. It is not a stable library yet but hopefully it can work as a drop-in solution for your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants