Why does Mixscale use raw counts for differential expression analysis instead of corrected data? #10

caodudu · 2024-10-02T16:48:35Z

Hi!
First of all, thank you for developing such a useful tool for analyzing Perturb-seq data! I have a question regarding the differential expression analysis in Mixscale. From reviewing the source code, I noticed that Run_wmvRegDE performs the differential expression analysis on the raw count data using a Gamma-Poisson model. However, I also see that Mixscale computes perturbation scores for each cell to correct for confounding effects. It appears to me that Mixscale uses the corrected data just to adjust the independent variable.

My question is: why does Mixscale use the raw, uncorrected count data for the differential expression analysis instead of applying the regression model to the corrected data? Wouldn't a linear regression or another model on the corrected data allow for more controlled analysis, taking into account the confounding factors more explicitly? As a contrast, cinema-OT performs t-test analysis on the corrected data though it doesn't consider the adjustment of perturbation label(https://github.com/vandijklab/CINEMA-OT/blob/main/cinemaot_tutorial.ipynb). Could we adjust the perturabtion label and expression matrix simultaneously?

I would appreciate any insights you can provide on this design choice. Thank you again for your work on Mixscale!

Best regards,
cdd

longmanz · 2024-10-02T19:28:31Z

Hello,
Thank you for using our tool and we appreciate your feedback.

You raise a good point on whether the DE test should be performed based on confounding-corrected data or not. This is a debatable question and also beyond the scope of our study. We developed the CalcPerturbSig method in our previous Mixscape paper, and we found it can help remove the unseen confounding factors like cell cycling effect and help unmask the true perturbation shift.

Yet, we have never tested how it would affect the DE analysis. It is always good to be cautious about applying correction to the data before DE tests, and it is especially true in our case because our CalcPerturbSig method corrects for unseen confounding (which cannot be modeled). Would it induce any kind of bias to the data? Would it lead to over-correction (too conserved)? Or would it lead to statistical inflation (false positives)? These are not fully explored. Therefore, we decided to go with the conventional (and safe) strategy, running DE on the uncorrected data :) I hope this answers your question. Again thank you for using our tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does Mixscale use raw counts for differential expression analysis instead of corrected data? #10

Why does Mixscale use raw counts for differential expression analysis instead of corrected data? #10

caodudu commented Oct 2, 2024 •

edited

Loading

longmanz commented Oct 2, 2024

Why does Mixscale use raw counts for differential expression analysis instead of corrected data? #10

Why does Mixscale use raw counts for differential expression analysis instead of corrected data? #10

Comments

caodudu commented Oct 2, 2024 • edited Loading

longmanz commented Oct 2, 2024

caodudu commented Oct 2, 2024 •

edited

Loading