scRNA-seq Analysis for MSc Research Project

Overview

This repository contains the code for the scRNA-seq analysis conducted as part of my MSc research project. The analysis is divided into two main pipelines:

Upstream Analysis Pipeline
Downstream Analysis Pipeline

These pipelines are designed to ensure efficient and reproducible workflows for processing and analysing single-cell RNA sequencing (scRNA-seq) data.

1. Upstream Analysis Pipeline

The upstream analysis pipeline focuses on the initial processing and preparation of the raw scRNA-seq data. The steps included in this pipeline are:

Raw Data (Count Table):
The pipeline begins with the raw count table obtained from sequencing.
Data Cleaning:
This step involves filtering and cleaning the raw data to remove any irrelevant or low-quality entries.
Quality Control:
Quality control checks are performed to ensure the data is of high quality, removing any cells or genes that do not meet specific criteria.
Doublet Removal:
This step identifies and removes potential doublets, which are instances where two cells are captured together, to avoid skewing the analysis.
Normalisation:
The data is normalised to median total counts then subsequently log1p transformed.
Batch Correction:
The data are batch corrected with scvi to account for sample level differences.
PCA (Principal Component Analysis):
PCA is performed to reduce the dimensionality of the data, helping to identify major trends and patterns.
kNN/UMAP:
Nearest neighbor clustering (kNN) and UMAP are applied to visualise the data in lower dimensions.
Clustering:
Cells are clustered based on similarity, identifying groups of cells with similar expression profiles.
Cell Annotation:
Finally, the clusters are annotated to assign biological meaning, identifying different cell types or states.

The output of this pipeline is an annotated data table that serves as the input for downstream analysis.

2. Downstream Analysis Pipeline

The downstream analysis pipeline focuses on deriving biological insights from the annotated data table produced by the upstream pipeline. The steps include:

DEG (Differential Expression Gene) Analysis:
Identification of genes that are differentially expressed between conditions or clusters. Scanpy, DESeq2 and limma-voom methods are all provided.
Gene Ontology (GO) Analysis:
GO analysis is performed to identify biological processes, cellular components, and molecular functions that are enriched in the differentially expressed genes.
- GO Enrichment Map:
  Visualisation of the GO terms enriched in the dataset.
KEGG Enrichment Analysis:
The KEGG pathway analysis identifies pathways that are enriched in the differentially expressed genes.
- Manual KEGG Pathway Analysis:
  Further manual curation and interpretation of the KEGG pathways to understand the underlying biological processes.

How to Use

To run the analysis, follow the instructions in the respective Jupyter notebooks and R scripts available in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
1. documents		1. documents
2. images		2. images
3. upstream_analysis		3. upstream_analysis
4. downstream_analysis		4. downstream_analysis
5. results_repository		5. results_repository
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scRNA-seq Analysis for MSc Research Project

Overview

1. Upstream Analysis Pipeline

2. Downstream Analysis Pipeline

How to Use

About

Releases

Packages

Languages

GlennRDx/scRNAseq-MSc-Analysis

Folders and files

Latest commit

History

Repository files navigation

scRNA-seq Analysis for MSc Research Project

Overview

1. Upstream Analysis Pipeline

2. Downstream Analysis Pipeline

How to Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages