Question about acceptable AUC, improving AUC #110

boyangzhang1993 · 2023-11-27T21:17:52Z

Hi Tangram Team,

I've been working with tangram-sc and am seeking advice on improving the AUC (Area Under the Curve) of my models for 10x visium.

Could you provide any tips or best practices for optimizing the AUC when using tangram-sc? Are there specific parameters, preprocessing steps, or data characteristics that typically have a significant impact on AUC improvement in the context of single-cell data analysis?
Is there a range of AUC that can be considered acceptable?
We have duplicated two samples from the same subject and several subjects in total. These samples were integrated using Harmony. My question is: Would you recommend using all these integrated samples as one combined input for Tangram, or is it more beneficial to split them into individual samples and run Tangram separately on each?

Any insights or suggestions you could offer would be greatly appreciated!

Thanks in advance for your help!

gaddamshreya1 · 2024-01-23T21:39:43Z

Thank you for you patience as well as your interest in Tangram!

Making sure either both single cell data and spatial data are normalized or use raw counts for both.
Using a set of marker genes depending on the cell types in your dataset as training genes.
One key assumption is that the single cell and spatial data are not only from the same species and disease system but also from the same specimen. Tangram will still do a good job but for the most ideal results using single cell and spatial data from the same specimen is recommended.

If you take a look at the curve which is used to compute the AUC we contrast score and spatial sparsity. Here the score is the cosine similarity between the measured and predicted spatial gene expression. Generally higher the sparsity of the genes, poorer the score; this is because Tangram attempts to correct the expression of these genes. So I would say AUC depends a little bit on the quality of dataset. If there are genes that are sparser in one dataset than another, that would reduce the AUC. To answer your question, I'm not sure if I can recommend an acceptable range of AUC. I would, however, recommend using this metric in addition to the diagnosis plots as well as the cell annotation maps to infer the quality of mapping by Tangram.
If the samples integrated by Harmony are from the same subject, it is OK to use the integrated dataset! However, if for each of your single cell datasets you also have a corresponding spatial dataset, then I would say map them separately.

Provide feedback