None of this code ran due to Azimuth error.
Impossible to solve even uninstalling R.
Integrative analysis can help to match shared cell types and states across datasets.
This leads to boosted statistical power and facilitate accurate comparative analysis across datasets.
There are many powerful methods: Harmony and scVI.
However, which methods should we use and how do we not loose biological resolution?
In Seurat, you can run different integration algorithms with single line code.
library(tidyverse)
library(ggplot2)
library(Seurat)
library(SeuratData)
library(SeuratWrappers)
library(Azimuth)
library(patchwork)
options(future.globals.maxSize = 1e9)
Seurat assays store data in layers.
These layers usually have two sets:
-
Counts: un-normalized raw counts
-
Data: normalized data
-
scale.data: z-scored/variance - stabilized data
Azimuth has too many errors regarding TFBSTools that I wasn’t able to resolve
InstallData("pbmcsca")
obj <- LoadData("pbmcsca")
obj <- subset(obj, nFeature_RNA > 1000)
# We use Azimuth to obtain predicted cell annotations
obj <- RunAzimuth(obj, reference = "pbmcref")
obj
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
obj <- FindNeighbors(obj, dims = 1:30, reduction = "pca")
obj <- FindClusters(obj, resolution = 2, cluster.name = "unintegrated_clusters")
obj <- RunUMAP(obj, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(obj, reduction = "umap.unintegrated", group.by = c("Method", "predicted.celltype.l2"))
5 methods
- Anchor based CCA integration (CCAIntegration)
Canonical Correlation Analysis.
CCA is well suited when cell types are conserved but thare are substantial differences in gene expression across experiments
CCA enables integrative analysis when experimental conditions or disease states introduce very strong expression shifts
or when integrating across modalities and species.
CCA can also lead to overcorrection, when large proportions of cells are non overlapping across datasets.
- Anchor based RPCA integration (RPCAIntegration)
Reciprocal PCA Integration.
When determining how to integrate two datasets, this method projects each PCA space to the other PCA space.
Runs substantially faster and is a more conservative approach where cells in different biological states are less likely to “align” after integration
This method is recommended for
-
Substantial fraction of cells in one dataset have no matching type in the other
-
Datasets originate from the same platform
-
There are large number of datasets or cells to integrate.
Constrains the anchors by the same mutual neighborhood requirement.
-
Harmony (HarmonyIntegration)
-
FastMNN (FastMNNIntegration)
-
scVI (scVIIntegration)
Different Integrations lead to different UMAP.
Once integrations end, you have to rejoin the layers
obj <- JoinLayers(obj)
obj