Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about acceptable AUC, improving AUC #110

Open
boyangzhang1993 opened this issue Nov 27, 2023 · 1 comment
Open

Question about acceptable AUC, improving AUC #110

boyangzhang1993 opened this issue Nov 27, 2023 · 1 comment

Comments

@boyangzhang1993
Copy link

boyangzhang1993 commented Nov 27, 2023

Hi Tangram Team,

I've been working with tangram-sc and am seeking advice on improving the AUC (Area Under the Curve) of my models for 10x visium.

  • Could you provide any tips or best practices for optimizing the AUC when using tangram-sc? Are there specific parameters, preprocessing steps, or data characteristics that typically have a significant impact on AUC improvement in the context of single-cell data analysis?
  • Is there a range of AUC that can be considered acceptable?
  • We have duplicated two samples from the same subject and several subjects in total. These samples were integrated using Harmony. My question is: Would you recommend using all these integrated samples as one combined input for Tangram, or is it more beneficial to split them into individual samples and run Tangram separately on each?

Any insights or suggestions you could offer would be greatly appreciated!

Thanks in advance for your help!

@gaddamshreya1
Copy link
Collaborator

Hi @boyangzhang1993!

Thank you for you patience as well as your interest in Tangram!

  1. General best practices for Tangram are:
  • Making sure either both single cell data and spatial data are normalized or use raw counts for both.
  • Using a set of marker genes depending on the cell types in your dataset as training genes.
  • One key assumption is that the single cell and spatial data are not only from the same species and disease system but also from the same specimen. Tangram will still do a good job but for the most ideal results using single cell and spatial data from the same specimen is recommended.
  1. If you take a look at the curve which is used to compute the AUC we contrast score and spatial sparsity. Here the score is the cosine similarity between the measured and predicted spatial gene expression. Generally higher the sparsity of the genes, poorer the score; this is because Tangram attempts to correct the expression of these genes. So I would say AUC depends a little bit on the quality of dataset. If there are genes that are sparser in one dataset than another, that would reduce the AUC. To answer your question, I'm not sure if I can recommend an acceptable range of AUC. I would, however, recommend using this metric in addition to the diagnosis plots as well as the cell annotation maps to infer the quality of mapping by Tangram.

  2. If the samples integrated by Harmony are from the same subject, it is OK to use the integrated dataset! However, if for each of your single cell datasets you also have a corresponding spatial dataset, then I would say map them separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants