Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing IntegrateData() and IntegrateLayers() #9494

Open
luanamarinho opened this issue Nov 19, 2024 · 0 comments
Open

Comparing IntegrateData() and IntegrateLayers() #9494

luanamarinho opened this issue Nov 19, 2024 · 0 comments

Comments

@luanamarinho
Copy link

Confirming my great regard for the advances enabled by Seurat, I would like to reach out with a few observations and questions regarding the an update in introduced in Seurat v5.

In our comparison of Seurat's integrative analysis workflows, we identified two key changes introduced in the IntegrateLayers function (v5.0.0) that raised concerns. Specifically, the slicing of both the “stitched” global scaled data layer and the corresponding PCA embedding across cells, replacing the actual scaling of the normalized datasets and their low-dimensional projections. These changes resulted in substantial differences in the anchor sets and moderate disparities in the structure of the Louvain-based communities. While IntegrateLayers() demonstrates improved computational efficiency—marked by reduced memory usage and runtime—the introduction of these subsets raises questions regarding the statistical robustness of this slicing approach.

Beyond concerns about the procedural rigor, we hypothesize that IntegrateLayers() may prioritize the preservation of global data aspects, such as inter-cell neighboring relationships, potentially at the expense of accurately capturing local relationships, such as intra-cluster variations. This aligns with our observation that IntegrateLayers() produces better-separated, tighter clusters compared to IntegrateData(), which integrates the scaled data layers and appears more suited for capturing fine-grained patterns like subtypes or sub-cell states. This aligns with the finding that IntegrateData() results in Louvain clusters that are more spread out.

While we recognize that the slicing of embeddings may be important for ensuring compatibility of axes directions during integration, we would appreciate further clarification and mathematical justification of these two key changes. Understanding the rationale behind these modifications would help us better assess whether this strategic shift indeed limits the resolution of local heterogeneity, allowing us to make more informed decisions regarding the trade-off between computational efficiency and the preservation of biological detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant