-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry on PLACO Method from PLoS Genetics Paper ([email protected]) #2
Comments
Hi @nvice111, when harmonizing two summary statistics I personally just flip one of the beta values to its negative, and EAF does not need to be flipped since EAF is not included in the matrix. Here is how I performed harmonization:
I look forward to your implementation. |
The EAF parameter does not participate in the calculation. I think your point of view is correct. The following is the reply sent to me by the author of the PLACO method in an email on November 7, 2023, you can also refer to it: Thanks for your interest in using PLACO. Responses below. Additionally, I’d suggest you read the paper thoroughly, including results from simulation experiments and materials in supplementary. 1.We are currently exploring this in a manuscript, and so I don’t have a definitive answer yet. Hope this helps. |
I have used [email protected] to send an email to you containing all the questions mentioned earlier. Perhaps you could copy the content of the email response so that it can be visible to everyone on GitHub as well.
I am reaching out to discuss the PLACO technique introduced in your publication: Ray, D., Chatterjee, N. (2020) "A powerful method for pleiotropic analysis under composite null hypothesis identifies novel shared loci between Type 2 Diabetes and Prostate Cancer". PLoS Genetics 16(12): e1009218. I find the technique very useful and have several questions I hope you could clarify:
1.Can PLACO be used to analyze the relationship between a binary trait (such as ovarian cancer with case control) and a continuous variable (such as epigenetic age acceleration) with complete GWAS data, including beta, se, and pval?
2.Can PLACO be used to analyze between one continuous variable (such as accelerated epigenetic aging, with the unit being years) and another continuous variable (such as telomere length, with the unit being standard deviation change in log-transformed telomere length), both having complete GWAS data including beta, standard error (se), and p-value (pval)?
3.After de-correlating the Z-scores, can PLACO be used on datasets with substantial sample overlap, or almost complete overlap, such as two datasets from the UK Biobank?
4.In your GitHub, you mentioned that PLACO should not be used for highly correlated GWAS, stating "PLACO does not work well if the two traits are strongly correlated." Does this conflict with the approach taken in the paper published in JAMA Psychiatry titled "Role of the Gut-Brain Axis in the Shared Genetic Etiology Between Gastrointestinal Tract Diseases and Psychiatric Disorders: A Genome-Wide Pleiotropic Analysis," where PLACO analysis was conducted on trait pairs that exhibited high genetic correlation? Of course, they performed de-correlation of the Z-scores.
5.You mentioned in GitHub, "When samples are related, PLACO can use the summary statistics from EMMAX (or other univariate mixed model frameworks) to appropriately test for genetic associations." However, it seems that this step is not included in the PLACO R code. Is that the case, or is it that this step is not necessary?
6.You mentioned, "Harmonize the same effect allele across the two studies/traits so that Z-scores from the two datasets can be jointly analyzed appropriately using PLACO." For example, if I have traits A and B, and their effect alleles happen to be opposite, which trait's beta value should be flipped to be negative, and does the EAF need to be changed to 1-EAF? Or is it fine either way, just flipping the beta value of one trait? Is there anything else that needs to be flipped?
7.Because the computation time for PLACO is very lengthy, I believe the coding approach can be altered a bit. Since the range of the square of the Z score is from 0 to 80, we only need to first compute the VarZ for this pair of traits, then calculate each Z1Z2 (for calculating p.placo, only the absolute value of Z1Z2 is used, which ranges from 0 to 80, and I've set the interval at 0.001), and determine the corresponding p.placo value for this VarZ. This requires only 80,000 calculations, and the result will be the same as the original code (which requires computing the p.placo for each SNP in the entire GWAS, totaling millions of calculations). If you find any errors, please let me know. I have only changed the order of computation, not the calculation rules of PLACO.
Here is my code:
#First calculate VarZ, which is different for each pair of traits.
print("Now let's start deriving the Zplus quantity.")
p1<-2as.double(integrate(Vectorize(.pdfx),abs(zplus/sqrt(varz[1])),Inf,abs.tol=AbsTol)$value)
p2<-2as.double(integrate(Vectorize(.pdfx),abs(zplus/sqrt(varz[2])),Inf,abs.tol=AbsTol)$value)
p0<-2*as.double(integrate(Vectorize(.pdfx), abs(zplus),Inf, abs.tol=AbsTol)$value)
pval.compnull<-p1+p2-p0
return(pval.compnull)
}
combined_df$Zplus <- as.character(combined_df$Zplus)
#Convert to string type, otherwise there will be errors in matching floating-point numbers.
results$Zplus <- as.character(results$Zplus)
I appreciate your time and assistance in answering these questions and am looking forward to implementing the PLACO method in my research.
The text was updated successfully, but these errors were encountered: