Skip to content

Code for the paper "An empirical Bayes approach to estimating dynamic models of co-regulated gene expression" (published in Data Science in Science, 2023)

Notifications You must be signed in to change notification settings

sara-venkatraman/Bayesian-Gene-Dynamics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dynamic models of co-regulated gene expression

The code in this repository implements the Bayesian lead-lag R2 (LLR2) algorithm introduced in Venkatraman et al. (2021) for biologically-informed clustering and network analysis of time-course gene expression data. For details of how it works, see Section 3 of our paper. Below is a short demo.

This repository consists of the following R files:

  • 1_LLR2.R: contains the function LLR2, which computes the Bayesian LLR2 similarity matrix for a given gene expression dataset
  • 2_PlottingFunctions.R: contains functions for plotting temporal gene expression trajectories
  • 3_Results.R: produces all figures and tables in our paper

Getting started

We begin by loading a N × n time-course gene expression dataset, consisting of N genes whose expressions are recorded at n time points, and a N × N prior adjacency matrix. The latter contains entries 0, 1, or NA to indicate that a biological relationship between the corresponding two genes is unlikely, likely, or unknown according to external sources.

# Read gene expression data and prior adjacency matrix
geneData <- read.csv("Data/GeneData.csv", row.names=1)
priorMatrix <- read.csv("Data/PriorMatrix.csv", row.names=1)

# Define hours corresponding to each time point
hours <- c(0, 1, 2, 4, 5, 6, 8, 10, 12, 14, 16, 20, 24, 30, 36, 42, 48)

This data comes from Schlamp et al. (2021), whose GitHub repository can be found here, and the external information comes from the STRING database (Szklarczyk et al. 2019).

Next we run the Bayesian LLR2 algorithm. For this demo, we’ll run it on a random subset of 150 genes.

# Select random subset of genes
set.seed(12657)
indices <- sample(1:nrow(geneData), size=150)
geneData <- geneData[indices,]
priorMatrix <- priorMatrix[indices, indices]

# Run Bayesian LLR2 similarity matrix computations
source("1_LLR2.R")
bayesLLR2 <- LLR2(geneData, hours, bayes=TRUE, priorMatrix, writeToCSV=FALSE)

Clustering with the Bayesian LLR2 metric

We now perform hierarchical clustering on the Bayesian LLR2 similarity matrix.

# Turn Bayesian LLR2 similarity matrix into a distance matrix
distanceMatrix <- as.dist(1 - bayesLLR2$LLR2Mat)
hierClust <- hclust(distanceMatrix, method="ward.D")

# Divide into 12 clusters
geneClusters <- cutree(hierClust, k=12)

Next we plot the genes in each cluster:

# Load plotting functions (we use Plot.Genes() below)
source("2_PlottingFunctions.R")

# Initialize list of plots and choose plot colors
plotList <- list()
plotColors <- c(brewer.pal(8, "Dark2"), brewer.pal(4, "Set1"))

# Plot expression trajectories of genes in each cluster
for(i in 1:12) {
  numGenes <- table(geneClusters)[i]
  plotList[[i]] <- Plot.Genes(geneData[geneClusters == i,], hours, points=FALSE, plotColors=rep(plotColors[i], numGenes), plotTitle=paste("Cluster ", i, " (", numGenes, " genes)", sep=""), plotLegend=FALSE, lineOpacity=0.6)
}
grid.arrange(grobs=plotList, ncol=4)

References

Schlamp, Florencia, Sofie YN Delbare, Angela M Early, Martin T Wells, Sumanta Basu, and Andrew G Clark. 2021. “Dense Time-Course Gene Expression Profiling of the Drosophila Melanogaster Innate Immune Response.” BMC Genomics 22 (1): 1–22.

Szklarczyk, Damian, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, et al. 2019. “STRING V11: Protein–Protein Association Networks with Increased Coverage, Supporting Functional Discovery in Genome-Wide Experimental Datasets.” Nucleic Acids Research 47 (D1): D607–13.

Venkatraman, Sara, Sumanta Basu, Andrew G Clark, Sofie Delbare, Myung Hee Lee, and Martin T Wells. 2021. “An Empirical Bayes Approach to Estimating Dynamic Models of Co-Regulated Gene Expression.” bioRxiv.

About

Code for the paper "An empirical Bayes approach to estimating dynamic models of co-regulated gene expression" (published in Data Science in Science, 2023)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published