Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does Mixscale use raw counts for differential expression analysis instead of corrected data? #10

Open
caodudu opened this issue Oct 2, 2024 · 1 comment

Comments

@caodudu
Copy link

caodudu commented Oct 2, 2024

Hi!
First of all, thank you for developing such a useful tool for analyzing Perturb-seq data! I have a question regarding the differential expression analysis in Mixscale. From reviewing the source code, I noticed that Run_wmvRegDE performs the differential expression analysis on the raw count data using a Gamma-Poisson model. However, I also see that Mixscale computes perturbation scores for each cell to correct for confounding effects. It appears to me that Mixscale uses the corrected data just to adjust the independent variable.

My question is: why does Mixscale use the raw, uncorrected count data for the differential expression analysis instead of applying the regression model to the corrected data? Wouldn't a linear regression or another model on the corrected data allow for more controlled analysis, taking into account the confounding factors more explicitly? As a contrast, cinema-OT performs t-test analysis on the corrected data though it doesn't consider the adjustment of perturbation label(https://github.com/vandijklab/CINEMA-OT/blob/main/cinemaot_tutorial.ipynb). Could we adjust the perturabtion label and expression matrix simultaneously?

I would appreciate any insights you can provide on this design choice. Thank you again for your work on Mixscale!

Best regards,
cdd

@longmanz
Copy link
Collaborator

longmanz commented Oct 2, 2024

Hello,
Thank you for using our tool and we appreciate your feedback.

You raise a good point on whether the DE test should be performed based on confounding-corrected data or not. This is a debatable question and also beyond the scope of our study. We developed the CalcPerturbSig method in our previous Mixscape paper, and we found it can help remove the unseen confounding factors like cell cycling effect and help unmask the true perturbation shift.

Yet, we have never tested how it would affect the DE analysis. It is always good to be cautious about applying correction to the data before DE tests, and it is especially true in our case because our CalcPerturbSig method corrects for unseen confounding (which cannot be modeled). Would it induce any kind of bias to the data? Would it lead to over-correction (too conserved)? Or would it lead to statistical inflation (false positives)? These are not fully explored. Therefore, we decided to go with the conventional (and safe) strategy, running DE on the uncorrected data :) I hope this answers your question. Again thank you for using our tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants