HGSOC Adipocytes Deconvolution

Deconvolution of bulk RNA-seq of High Grade Serous Ovarian Carcinoma (HGSOC) incorporating adipocytes for survival analysis

High Grade Serous Ovarian Carcinoma (HGSOC) is characterized by heterogeneity at the cellular level. Deconvolution methods can estimate cell-type composition from bulk RNA-seq data using single-cell references. However, adipocytes are often underrepresented in single-cell RNA-seq. This project integrates single-nucleus RNA-seq adipocyte data with HGSOC single-cell references and applies InstaPrism for deconvolution of bulk RNA-seq and microarray datasets from Black and White patient cohorts: “SchildkrautB” and “SchildkrautW”.

Features

Preprocessing of bulk RNA-seq.
TRanscriptomic subtypes: consensusOV.
Construction of integrated single-cell/single-nucleus reference (adipocytes).
Cox Proportional Hazard Survival analysis.
Clinical metadata comparisons (race, age, BMI, FIGO stage, etc.)

Running the full pipeline

Data Preparation

Place all raw data files into input_data/ following the structure described in the Data Requirements section. Use scripts/unzip_input_data.py to decompress archives if necessary (not necessary if running through run_pipeline.sh)

Clone the repository:

git clone https://github.com/ivichadriana/HGSOC_Adipocytes.git
cd HGSOC_Adipocytes

Run pipeline
```
bash run_pipeline.sh
```

This script will create the environment (currently supports Linux or Mac) and run the full analysis. You can also navigate through scripts and run each step (they are numbered).

Workflow Overview

Create appropiate environments: 0_create_environment.sh creates conda R and Python environments.
Decompress raw data: unzip_input_data.py
Process bulk datasets: 1_process_data_and_subtypes.R filters, transforms, and clusters data.
Prepare reference matrix: 2_prepare_deconvolution.R generates combined single-cell/nucleus reference InstaPrism.
Run deconvolution: 3_run_deconvolution.R applies InstaPrism on bulk datasets for proportion estimates.
Notebooks with analysis: notebooks/analysis_X.ipynb Each notebook contains a distinct analysis. See file names for details.

Notebooks

Interactive analyses are provided in notebooks/:

analysis_X.ipynb: run analysis X where X is comparison (e.g., proportions vs. survival)

Results

All generated files from deconvolution are saved under output_data/, and the visualizations and analysis are saved in the notebooks/ folder.

Data Requirements

Bulk HGSOC (Schildkraut) datasets and clinical data.

SchildkrautB (Black patients; clinical metadata & raw counts)
SchildkrautW (White patients; clinical metadata & raw counts)

Available upon request. Please email [email protected] for access.

Reference gene lists & clustering metadata

greenelab/hgsc_characterization

Reference mappings
reference_data/ensembl_hgnc_entrez.tsv
Gene list & cluster assignments
data/way_pipeline_results_10removed_NeoRemoved_inclWhites/1.DataInclusion-Data-Genes/GlobalMAD_genelist.csv
data/way_pipeline_results_10removed_NeoRemoved_inclWhites/2.Clustering_DiffExprs-Tables-ClusterMembership/FullClusterMembership.csv

Single‑cell RNA‑seq (HGSOC reference for deconvolution)

GEO accession: GSE217517
Replicates 1–8 (GSM6720925–GSM6720932) each include:

GSMxxxxxxx_single_cell_barcodes_<n>.tsv.gz  
GSMxxxxxxx_single_cell_features_<n>.tsv.gz  
GSMxxxxxxx_single_cell_matrix_<n>.mtx.gz

Cell‑type labels

From greenelab/deconvolution_pilot:

data/cell_labels/
  ├── 2251_labels.txt
  ├── 2267_labels.txt
  ├── 2283_labels.txt
  ├── 2293_labels.txt
  ├── 2380_labels.txt
  ├── 2428_labels.txt
  ├── 2467_labels.txt
  └── 2497_labels.txt

Single‑nucleus adipocyte RNA‑seq

GEO accession: GSE176171
Samples GSM5359325–GSM5820686 (e.g.):

GSM5359325_Hs_OAT_01-1.dge.tsv.gz  
…  
GSM5820686_Hs_SAT_11-1.dge.tsv.gz

Directory layout:

project-root/
├── renv/
├── scripts/
├── input_data/    ← place all downloaded files here  
└── output_data/   ← script-generated results

Contributing and Issues.

Please open issues or pull requests for improvements. We aim to be responsive in the Issues section.

License

This project is licensed under the BSD-3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HGSOC Adipocytes Deconvolution

Features

Running the full pipeline

Workflow Overview

Notebooks

Results

Data Requirements

Bulk HGSOC (Schildkraut) datasets and clinical data.

Reference gene lists & clustering metadata

greenelab/hgsc_characterization

Single‑cell RNA‑seq (HGSOC reference for deconvolution)

Cell‑type labels

Single‑nucleus adipocyte RNA‑seq

Contributing and Issues.

License

About

Uh oh!

Releases

Packages

Languages

License

greenelab/HGSOC_Adipocytes

Folders and files

Latest commit

History

Repository files navigation

HGSOC Adipocytes Deconvolution

Features

Running the full pipeline

Workflow Overview

Notebooks

Results

Data Requirements

Bulk HGSOC (Schildkraut) datasets and clinical data.

Reference gene lists & clustering metadata

greenelab/hgsc_characterization

Single‑cell RNA‑seq (HGSOC reference for deconvolution)

Cell‑type labels

Single‑nucleus adipocyte RNA‑seq

Contributing and Issues.

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages