scOPE is a transfer-learning framework that uses bulk RNA-seq cohorts with known mutation status to learn a compact, biologically meaningful latent space, then projects single-cell RNA-seq into that same space to predict the probability that specific cancer-associated gene mutations are present in individual cells. This provides mutation-informed subclonal structure that can complement CNV methods and increase subclonal granularity.
scOPE proceeds in two phases:
-
i. Bulk expression matrix
Construct a bulk cohort expression matrix A_bulk (rows = patient samples, columns = genes).
The bulk matrix is normalized / centered / scaled to ensure comparable gene-wise signal. -
ii. Latent feature mapping via SVD
Decompose the normalized bulk matrix using SVD:- U_bulk: sample scores (rows = patients, columns = latent factors)
- Σ_bulk: diagonal matrix of singular values
- V: gene loadings (rows = genes, columns = latent factors)
Define the bulk latent representation (patient-by-factor embedding):
-
iii. Train mutation-prediction models in latent space
For each mutation / gene-of-interest, train a supervised ML model to predict mutation presence Y from Z_bulk.
This yields one (or multiple) mutation-specific classifiers operating on the learned latent factors.
-
i. Single-cell expression matrix
Construct a single-cell expression matrix A_sc (rows = single cells, columns = genes). -
ii. Normalize scRNA-seq using bulk-derived parameters
Apply the same gene-wise normalization / centering / scaling learned from the bulk cohort to obtain A′_sc.
(This alignment step makes the projection comparable across bulk and single-cell.) -
iii. Project cells into latent space and infer mutation probabilities
Use the bulk-derived gene loadings V to compute the single-cell latent representation:Then apply the trained bulk models to Z_sc to predict per-cell mutation probabilities, producing mutation-informed cellular maps that can be analyzed alongside expression programs, clusters, and CNV signals.
pip install scope-biopip install scope-bio[full]conda install -c conda-forge scope-biogit clone https://github.com/Ashford-A/scOPE.git
cd scOPE
conda env create -f environments/scope-dev.yml
conda activate scope-dev
pip install -e ".[dev]"import anndata as ad
import pandas as pd
from scope import BulkPipeline, SingleCellPipeline
from scope.io import load_mutation_labels
# --- Phase 1: Bulk --------------------------------------------------------
adata_bulk = ad.read_h5ad("bulk_cohort.h5ad")
mutation_labels = load_mutation_labels("mutations.csv", sample_col="sample_id")
bulk_pipe = BulkPipeline(
norm_method="cpm",
decomposition="svd", # "svd" | "nmf" | "ica" | "pca"
n_components=50,
classifier="logistic", # "logistic" | "random_forest" | "gbm" | "xgboost" | "lightgbm" | "svm" | "mlp"
)
bulk_pipe.fit(adata_bulk, mutation_labels, cv=5)
bulk_pipe.save("models/bulk_pipeline.pkl")
# --- Phase 2: Single cell --------------------------------------------------
adata_sc = ad.read_h5ad("sc_tumor.h5ad")
# Prepare a preprocessed bulk reference for moment matching
adata_bulk_pp = bulk_pipe.preprocessor_.transform(adata_bulk)
sc_pipe = SingleCellPipeline(
bulk_pipeline=bulk_pipe,
alignment_method="z_score_bulk", # "z_score_bulk" | "moment_matching" | "quantile" | "none"
)
sc_pipe.fit(adata_bulk_pp, adata_sc)
adata_sc = sc_pipe.transform(adata_sc)
# adata_sc.obs now contains columns: mutation_prob_KRAS, mutation_prob_TP53, ...
# --- Visualise -------------------------------------------------------------
from scope.visualization import compute_umap, plot_mutation_probabilities
adata_sc = compute_umap(adata_sc, obsm_key="X_svd")
fig = plot_mutation_probabilities(adata_sc, mutations=["KRAS", "TP53"])
fig.savefig("mutation_probs.pdf", bbox_inches="tight")| Class | Description |
|---|---|
BulkNormalizer |
CPM / TPM / median-ratio / TMM normalisation |
BulkScaler |
Gene-wise centering and scaling |
BulkPreprocessor |
Combined normalise + scale |
SingleCellPreprocessor |
QC filter + normalise + optional scale |
BulkSCAligner |
z-score / moment-matching / quantile alignment |
| Class | Description |
|---|---|
SVDDecomposition |
Truncated SVD (randomized / ARPACK / full) |
NMFDecomposition |
Non-negative matrix factorization |
ICADecomposition |
FastICA |
PCADecomposition |
PCA via sklearn |
get_decomposition(name) |
Factory function |
| Class | Description |
|---|---|
LogisticMutationClassifier |
L1/L2/ElasticNet logistic regression |
RandomForestMutationClassifier |
Random forest |
GBMMutationClassifier |
Gradient boosting (sklearn) |
XGBMutationClassifier |
XGBoost |
LGBMMutationClassifier |
LightGBM |
SVMMutationClassifier |
SVM + Platt calibration |
MLPMutationClassifier |
Multi-layer perceptron |
PerMutationClassifierSet |
Trains/stores one classifier per mutation |
get_classifier(name) |
Factory function |
| Function | Description |
|---|---|
evaluate_classifier |
AUROC, AUPRC, Brier score |
evaluate_all |
Evaluate all mutations at once |
cross_validate_classifiers |
Stratified k-fold CV |
roc_curve_data / pr_curve_data |
Curve arrays for plotting |
| Function | Description |
|---|---|
compute_umap |
UMAP on latent embedding |
compute_tsne |
t-SNE on latent embedding |
plot_embedding |
Scatter by categorical or continuous |
plot_mutation_probabilities |
Grid of per-mutation probability overlays |
plot_scree |
Singular value / EVR scree plot |
plot_mutation_heatmap |
Mean probability per cluster heatmap |
pytest tests/ -v --cov=scope --cov-report=term-missingIf you use scOPE in your research, please cite:
Ashford, A. et al. (2024). scOPE: transfer-learning from bulk RNA-seq to infer per-cell mutation probabilities in single-cell transcriptomics. [Journal] doi: ...
(Work currently in progress - will be filled out later once preprint is up on bioRxiv)
MIT — see LICENSE.