MapQuery for many cells and batches returns bad cluster prediction results #9467
Unanswered
martibonomi
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am analysing scRNA-seq data from three different datasets comprising only T cells.
Since each dataset has several batches (170 in total) and I have ~435k cells, I wanted to create a reference using 30 batches (total 70k cells) and then project the remaining 140 batches (total 365k cells) on the reference to map the remaining cells and assign them a corresponding cluster based on the reference UMAP.
To do this, I integrated the reference using the RPCA algorithm since with CCA I did not obtain good results for the integration, and used the following code:
Afterwards, I mapped each single query batch (remaining 140 batches) on the reference independently using the following code:
However, when plotting the results of the projection for query batches, I noticed that the predicted clusters are not confined in the same region of the UMAP as in the reference, but they are spread all around the UMAP and when plotting predicted scores, the majority of cells have a very low cluster prediction score, as shown in the following figure (on the left the clusters from the reference, on the right the predicted clusters from the query batches projected on the reference and coloured by cluster prediction score):
What can I do to improve these results? Is there any parameter that I can modify to improve the projection and predictions? Is it actually correct to do this?
Would it be more likely that projected cells on the UMAP are in the correct position but should be assigned the cluster at the location where they are projected on the UMAP or is it more likely that they actually belong to the predicted cluster but they have been projected on a wrong position?
Thank you very much for your help!
Beta Was this translation helpful? Give feedback.
All reactions