rcCAE

rcCAE is a convolutional autoencoder (CAE) based method for detecting tumor clones and copy number alterations from single-cell DNA sequencing data.

Requirements

Python 3.8+.

Installation

Clone repository

First, download rcCAE from github and change to the directory:

git clone https://github.com/zhyu-lab/rccae
cd rccae

Create conda environment (optional)

Create a new environment named "rccae":

conda create --name rccae python=3.8.15

Then activate it:

conda activate rccae

Install requirements

To successfully compile the source code, please make sure g++-5 is installed on your machine. you can install it using following commands:

sudo apt install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 20  
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 20

If your current compiler is not g++-5, you need to change the current compiler to gcc-5 and g++-5:

sudo update-alternatives --config gcc
sudo update-alternatives --config g++

Then install related packages and compile the code:

python -m pip install -r ./cae/requirements.txt
cd prep
sudo cmake .
make
cd ..
chmod +x run_rccae.sh ./hmm/run_SCHMM.sh

If the version of PyTorch does not match the CUDA version, you can find version dependence between PyTorch and CUDA at https://pytorch.org/get-started/previous-versions/, and select appropriate PyTorch version to install on your machine.

After the installation completes, you can reset the compiler using same commands:

sudo update-alternatives --config gcc
sudo update-alternatives --config g++

Usage

Step 1: prepare input data

The “./prep/bin/prepInput” command is used to obtain read counts, GC-content and mappability data.

To successfully run the command you will need to obtain/create these items:

A merged BAM file (10X Genomics) containing sequencing data of all cells or seperate BAM files of single cells
A FASTA file defining reference sequence
A BIGWIG file for calculating mappability scores
A barcode file listing the barcodes of all cells or names of all BAM files to be analyzed

Reference sequence file formatted as .fasta can be downloaded from UCSC genome browser.

Mappability files formatted as .bw for human genomes are available from UCSC genome browser. Users can also generate their own mappability files using gem-library and wigToBigWig utility.

Here is an example for creating mappability file from reference sequence (suppose “$gemlibrary” and “$bigwig” are the pathes where gem-library and wigToBigWig are installed, respectively).

chmod +x $gemlibrary/bin/gem* $bigwig/wigToBigWig
export PATH=$PATH:$gemlibrary/bin:$bigwig
gem-indexer -T 10 -c dna -i ./testData/refs/example.fa -o ./testData/refs/example_index
gem-mappability -T 10 -I ./testData/refs/example_index.gem -l 36 -o ./testData/refs/example_36
gem-2-wig -I ./testData/refs/example_index.gem -i ./testData/refs/example_36.mappability -o ./testData/refs/example_36
wigToBigWig ./testData/refs/example_36.wig ./testData/refs/example.sizes ./testData/refs/example_36.bw

The all arguments of the “./prep/bin/prepInput” command are as follows:

Parameter	Description	Possible values
-b, --bam	a merged BAM file (10X Genomics) or a directory containing BAM files of the cells to be analyzed	Ex: /path/to/sample.bam
-r, --ref	genome reference file (.fasta)	Ex: /path/to/hg19.fa
-m, --map	mappability file (.bw)	Ex: /path/to/hg19.bw
-B, --barcode	a file listing the barcodes of all cells or names of all BAM files to be analyzed	Ex: /path/to/barcodes.txt
-c, --chrlist	the entries chromosomes to be analyzed (should be separated by commas)	Ex: chrlist=1,2,3,4,5 default:1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
-s, --binsize	set the size of bin to count reads	Ex: binsize=200000 default:500000
-q, --mapQ	threshold value for mapping quality	Ex: mapQ=10 default:20
-o, --output	output file to save results	Ex: /path/to/example.txt

Example:

./prep/bin/prepInput -b /path/to/sample.bam -r /path/to/hg19.fasta -m /path/to/hg19.bw -B /path/to/barcodes.txt -o example.txt

Step 2: train the CAE model

The “./cae/train.py” Python script is used to learn latent representations of cells and cluster cells into distinct subpopulations.

The arguments to run “./cae/train.py” are as follows:

Parameter	Description	Possible values
--input	input file generated by “./prep/bin/prepInput” command	Ex: /path/to/example.txt
--output	a directory to save results	Ex: /path/to/results
--epochs	number of epoches to train the CAE	Ex: epochs=200 default:100
--batch_size	batch size	Ex: batch_size=32 default:64
--lr	learning rate	Ex: lr=0.0005 default:0.0001
--max_k	maximum number of clusters to consider	Ex: max_k=30 default:N/5 (N is the number of cells)
--latent_dim	latent dimension	Ex: latent_dim=4 default:3
--kernel_size	convolutional kernel size	Ex: kernel_size=5 default:7
--seed	random seed	Ex: seed=1 default:0

Example:

python ./cae/train.py --input ./data/example.txt --epochs 200 --batch_size 64 --lr 0.0001 --latent_dim 3 --seed 0 --output data

Step 3: detect single-cell CNAs

The “./hmm/SCHMM.m” MATLAB script is used to call single-cell CNAs.

The arguments run “./hmm/SCHMM.m” are as follows:

Parameter	Description	Possible values
inputFile	“lrc.txt” file generated by “./cae/train.py” script	Ex: /path/to/lrc.txt
outputDir	a directory to save results	Ex: /path/to/results
maxCN	maximum copy number to consider	Ex: 6 default:10

Example:

SCHMM('../data/lrc_example.txt','../data',10)

We also provide a script “run_rccae.sh” to integrate all three steps to run rcCAE. This script requires that MATLAB Compiler Runtime (MCR) v91 (R2016b) is installed in user's machine. The MCR can be downloaded from MathWorks Web site.

Example:

./run_rccae.sh /path/to/bam /path/to/ref.fa /path/to/ref.bw /path/to/barcodes.txt /path/to/results

Type ./run_rccae.sh to learn details about how to use this script.

Contact

If you have any questions, please contact [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rcCAE

Requirements

Installation

Clone repository

Create conda environment (optional)

Install requirements

Usage

Step 1: prepare input data

Step 2: train the CAE model

Step 3: detect single-cell CNAs

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cae		cae
data		data
hmm		hmm
prep		prep
LICENSE		LICENSE
README.md		README.md
run_rccae.sh		run_rccae.sh

License

zhyu-lab/rccae

Folders and files

Latest commit

History

Repository files navigation

rcCAE

Requirements

Installation

Clone repository

Create conda environment (optional)

Install requirements

Usage

Step 1: prepare input data

Step 2: train the CAE model

Step 3: detect single-cell CNAs

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages