Skip to content

Latest commit

 

History

History
141 lines (91 loc) · 3.21 KB

16.5-SeuratVignetteIntegrativeAnalysis.md

File metadata and controls

141 lines (91 loc) · 3.21 KB

Integrative Analysis in Seurat

None of this code ran due to Azimuth error.

Impossible to solve even uninstalling R.

Introduction to Single-cell RNA sequence Integration

Integrative analysis can help to match shared cell types and states across datasets.

This leads to boosted statistical power and facilitate accurate comparative analysis across datasets.

There are many powerful methods: Harmony and scVI.

However, which methods should we use and how do we not loose biological resolution?

In Seurat, you can run different integration algorithms with single line code.

Load libraries

library(tidyverse)
library(ggplot2)

library(Seurat)
library(SeuratData)
library(SeuratWrappers)

library(Azimuth)

library(patchwork)
options(future.globals.maxSize = 1e9)

Load Dataset

Seurat assays store data in layers.

These layers usually have two sets:

  • Counts: un-normalized raw counts

  • Data: normalized data

  • scale.data: z-scored/variance - stabilized data

Load Data

Azimuth has too many errors regarding TFBSTools that I wasn’t able to resolve

InstallData("pbmcsca")
obj <- LoadData("pbmcsca")
obj <- subset(obj, nFeature_RNA > 1000)
# We use Azimuth to obtain predicted cell annotations
obj <- RunAzimuth(obj, reference = "pbmcref")
obj

Split by method

obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)
obj

Seurat Standard Workflow

obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
obj <- FindNeighbors(obj, dims = 1:30, reduction = "pca")
obj <- FindClusters(obj, resolution = 2, cluster.name = "unintegrated_clusters")

obj <- RunUMAP(obj, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")
DimPlot(obj, reduction = "umap.unintegrated", group.by = c("Method", "predicted.celltype.l2"))

Perform streamlined (one-line) Integrative Analysis

5 methods

  1. Anchor based CCA integration (CCAIntegration)

Canonical Correlation Analysis.

CCA is well suited when cell types are conserved but thare are substantial differences in gene expression across experiments

CCA enables integrative analysis when experimental conditions or disease states introduce very strong expression shifts

or when integrating across modalities and species.

CCA can also lead to overcorrection, when large proportions of cells are non overlapping across datasets.

  1. Anchor based RPCA integration (RPCAIntegration)

Reciprocal PCA Integration.

When determining how to integrate two datasets, this method projects each PCA space to the other PCA space.

Runs substantially faster and is a more conservative approach where cells in different biological states are less likely to “align” after integration

This method is recommended for

  • Substantial fraction of cells in one dataset have no matching type in the other

  • Datasets originate from the same platform

  • There are large number of datasets or cells to integrate.

Constrains the anchors by the same mutual neighborhood requirement.

  1. Harmony (HarmonyIntegration)

  2. FastMNN (FastMNNIntegration)

  3. scVI (scVIIntegration)

Different Integrations lead to different UMAP.

Once integrations end, you have to rejoin the layers

obj <- JoinLayers(obj)
obj