How to implement local Bayesian regressions with gene-by-sex interactions. It includes a toy dataset (simulated genotypes and phenotypes) for which to test LBR on your local computer.
- Repository contents
- Description of data
- Initial Setup
- Fitting LBR with sex interactions
- Processing posterior samples
- Example plots
- Simulated genotype data (sample size = 5000, number of SNPs = 1000)
- Simulated phenotype data for all 5000 individuals
- A demonstration for how to fit LBR with sex interactions using BGLR
- Functions for processing BGLR output to:
- Estimate sex-specific effects at the level of individual SNP markers or LD-based regions.
- Infer nonzero sex-specific effects and GxS interactions.
Genotypes supplied in this repository were simulated using knowledge of real LD patterns.
Briefly, genotypes for SNPj were randomly sampled from the conditional distribution of SNPj given SNPj-1, in which the conditional distribution was estimated using UK Biobank genotypes.
A small dataset (N = 5000) is provided here so that one can easily test these methods on their personal computer.
Given the small sample size, one causal variant was used to explain a un-realistically large proportion of phenotypic variance (5% for both males and females).
The additive effect of the causal variant is three times larger in females than in males. Given that the heritability of the causal variant is the same in males and females, this results in both larger genomic variance and larger residual variance in females than in males.
First clone this repository, cd into it, and launch the R interpreter. This tutorial assumes you have R version 3.6.0 or newer installed on your computer.
git clone https://github.com/funkhou9/LBR-sex-interactions.git
cd LBR-sex-interactions
R
The packages BGLR and BGData are required to fit the LBR model. The tidyverse package is recommended for downstream processing of posterior samples.
install.packages(c("BGLR", "BGData", "tidyverse"))
library(BGLR)
library(BGData)
library(tidyverse)
Source additional functions provided in this repository
source("src/getWindows.R")
source("src/process_samples.R")
Load toy genotype data, containing simulated genotype matrix XS
.
Load toy phenotype data, containing:
- Simulated phenotypes
y
and identifiers for malesmen
and femaleswomen
. - Identifier for the causal variant
cv
- Male-specific effect size for the causal variant
b_male
and female-specific effect size for the causal variantb_female
load("data/genotypes/DATA.RData")
load("data/phenotypes/DATA.RData")
Center and scale genotypes and phenotypes, and build sex-specific interaction
terms WF
and WM
.
W <- scale(XS)
y <- scale(y)
WF <- W
dF <- ifelse(rownames(WF) %in% women, 1, 0)
for(i in 1:ncol(W)) {
WF[, i] <- W[, i] * dF
}
WM <- W
dM <- ifelse(rownames(WM) %in% men, 1, 0)
for(i in 1:ncol(W)) {
WM[, i] <- W[, i] * dM
}
Specify hyperparameters for each element of the linear predictor:
- The main effect of sex, given a uniform prior,
- The main genetic effect, given a BayesC prior ,
- The female-specific interaction, given a BayesC prior,
- The male-specific interaction, given a BayesC prior.
BayesC priors, also known as point-normal priors, are parameterized by:
- Pi, the proportion of non-null SNP effects. This is treated as random and given a Beta
prior (controlled by
counts
andprobIn
arguments). - The variance of non-null SNP marker effects. This is treated as random and given a
scaled-inverse Chi-square prior (controlled by
R2
argument).
See the supplementary material from Perez and de los Campos 2014 for more information on prior densities used in BGLR.
NOTE: When fitting real data, it is necessary to fit or pre-correct for additional covariates, which may include age, batch, genomic PCs, and genotyping center.
ETA <- list(
FIX = list(X = as.matrix(dF),
model = 'FIXED'),
main = list(X = W,
R2 = 0.03,
model = "BayesC",
probIn = 0.5,
counts = 4,
saveEffects = TRUE),
female = list(X = WF,
R2 = 0.01,
model = "BayesC",
probIn = 0.5,
counts = 4,
saveEffects = TRUE),
male = list(X = WM,
R2 = 0.01,
model = "BayesC",
probIn = 0.5,
counts = 4,
saveEffects = TRUE)
)
Run BGLR. BGLR samples from the fully conditional posterior distributions
of interest using MCMC. Run 1200 iterations of the MCMC sampler, discarding the
initial 200 samples and thinning the samples by 2. BGLR allows for group-specific
residual error variances, using the groups
argument.
if (!dir.exists("LBR_results/")) dir.create("LBR_results/")
fm <- BGLR(y = y,
ETA = ETA,
groups = dF,
R2 = 0.05,
nIter = 1200,
burnIn = 200,
thin = 2,
saveAt = "LBR_results/")
One can utilize the function provided in src/process_samples.R
.
For each SNP, process_samples()
provides:
- Sex-specific SNP effect estimates
- Posterior probability of non-zero sex-specific SNP effects
- Posterior probability of sex difference in SNP effects
For each SNP, process_samples()
also aggregates the effects of nearby SNPs within
LD based windows by estimating sex-specific "window variances". This provides:
- Sex-specific window variance estimates
- Posterior probability of non-zero sex-specific window variances
- Posterior probability of sex difference in window variances
results <- process_samples(X = W,
main = "LBR_results/ETA_main_b.bin",
male_int = "LBR_results/ETA_male_b.bin",
female_int = "LBR_results/ETA_female_b.bin")
The results
object is a tibble with rows corresponding to each SNPj
and the following columns:
SNP
: SNP nameb_male
: male-specific SNP effectb_female
: female-specific SNP effectPP_male
: Posterior probability of a non-zero SNP effect in malesPP_female
: Posterior probability of a non-zero SNP effect in femalesPP_diff
: Posterior probability of a sex difference in SNP effectsb_male_window
: male-specific window variance, centered on SNPjb_female_window
: male-specific window variance, centered on SNPjPP_male_window
: Posterior probability of a non-zero window variance in malesPP_female_window
: Posterior probability of a non-zero window variance in femalesPP_diff_window
: Posterior probability of a sex difference in window variances
Plotting male-specific SNP effects against female-specific SNP effects and coloring points by the posterior probability of sex difference.
ggplot(results, aes(x = b_male, y = b_female, color = PP_diff)) +
geom_point() +
geom_abline(slope = 1, intercept = 0, col = "blue", alpha = 0.3) +
geom_vline(xintercept = 0, col = "black", linetype = "dashed", alpha = 0.5) +
geom_hline(yintercept = 0, col = "black", linetype = "dashed", alpha = 0.5) +
scale_color_gradient(name = expression(PPDiff[SNP]),
high = "#f44242") +
ylim(-0.2, 0.2) +
xlim(-0.2, 0.2) +
labs(x = expression(hat(beta)[m]), y = expression(hat(beta)[f]))
Miami-like plot to show male- and female-specific window variance estimates, coloring points by posterior probability of sex-difference. The true position of the causal variant is marked with a dashed blue line.
ggplot() +
geom_point(data = results,
aes(x = seq_along(SNP),
y = b_male_window,
color = PP_diff_window)) +
geom_segment(data = results,
aes(x = seq_along(SNP),
y = b_male_window,
color = PP_diff_window,
xend = seq_along(SNP),
yend = 0)) +
geom_point(data = results,
aes(x = seq_along(SNP),
y = -b_female_window,
color = PP_diff_window)) +
geom_segment(data = results,
aes(x = seq_along(SNP),
y = -b_female_window,
color = PP_diff_window,
xend = seq_along(SNP),
yend = 0)) +
scale_color_gradient(name = expression(PPDiff[sigma[g]^2]),
low = "#5D6D7E",
high = "#f44242") +
geom_vline(xintercept = cv, linetype = "dashed", color = "blue", alpha = 0.5) +
ylim(-0.05, 0.05) +
geom_hline(yintercept = 0, color = "white") +
labs(x = "Focal SNP",
y = "-(Est. female window variance) +(Est. male window variance)")