Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strong impact of number of ADT PCs on WNN UMAP? #9482

Open
erflynn opened this issue Nov 14, 2024 · 1 comment
Open

Strong impact of number of ADT PCs on WNN UMAP? #9482

erflynn opened this issue Nov 14, 2024 · 1 comment

Comments

@erflynn
Copy link

erflynn commented Nov 14, 2024

I'm running WNN following the tutorial on a dataset of about 200,000 skin cells. I noticed that choice of number of ADT PCs has a huge impact on the UMAP -- the UMAP clearly looks better with more ADT PCs, despite the elbow plot indicating we should likely use fewer. I am using the same exact settings (30 RNA PCs) except the number of ADT PCs. Seurat version 4.3.0.
adt_pcs

wnn <- FindMultiModalNeighbors(
  wnn, reduction.list = list("pca", "apca"), 
  dims.list = list(1:30, 1:n_adt_pcs), modality.weight.name = "RNA.weight",
  prune.snn=1/20 # adjusted to avoid small clusters
)
wnn <-RunUMAP(wnn, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")

Have you noticed this before? Is this expected behavior? What would you recommend doing to proceed?

@erflynn
Copy link
Author

erflynn commented Nov 20, 2024

Following up on this with a reprex --
Using the bmcite dataset and code in the (WNN tutorial)[https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis], we also see the pattern that too few PCs from ADT or RNA lead to poor UMAPs; however, the numbers at which this occurs are much lower (e.g. 3 or 5 PCs rather than 10 or 15). It does seem to be the case that after "enough" PCs, it looks relatively similar? (Though in other cases, particularly in sub-clustering, we've found that "too many" PCs makesthe UMAP look worse.)

bmcite_rna_adt_pc_vary
bmcite_pc_vary_selected

I understand that the low #s of PCs do not capture the variation in the data, making the nearest neighbor space for that modality poor. As a result, since WNN considers both the ability of the RNA and ADT PCs to predict cell identity in each space when determining the weights, this adds some noise.

However, I do not understand why, in the original example, while selecting more than 15 ADT PCs does not make sense given the elbow plot, it results in a much cleaner WNN UMAP. We've also looked and these additional PCs do not associate with RNA cell type, and ADT UMAPs with more PCs still look very messy. Can you lend any insights as to why adding these PCs would be helpful? Given this, what do you recommend to do when choosing number of ADT PCs in larger datasets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant