Nano-smallRNAseq

This is the code for Nanopore based RNA methylation profiling of a circulating lung cancer biomarker.

Installation

Make sure these tools are installed on your system, as they are called from inside the code:

rsync
guppy_basecaller (for SQK-RNA002)
dorado (SQK-RNA004)
seqkit
bwa
samtools

Comet

Create an account and Project on https://www.comet.com/
Create a file comet_api_key.txt and copy your comet API key into it
Edit the comet section in the main config file configs/main.yaml

Environment

Build the nanopore environment with mamba or conda

mamba create -n nanopore
mamba env update -n nanopore --file environment.yaml
mamba activate nanopore

Training

Configuration

Create config yaml files for your datasets used for training (so probably one for methylation and one for barcoding) in the folder configs/input/inputset following the example structure in configs/input/inputset/example.yaml. Each dataset is a collection of fast5 folders each given a label, a FASTA reference file and a name (for logging).

Run training

To train a model run python -m ont_hbdx.train. No input data is set by default, so give a list of input data configs, which will be combined. You find and can add new datasets in configs/input/inputset.

Example with two data folder inputs (filenames without the suffix):

python -m ont_hbdx.train  +input/inputset=[barcoding_mir17,barcoding_miLung]

To run a grid-search over parameters that is part of the configs you can for example run: (Note the --multirun flag and the list of values for two params):

python -m ont_hbdx.train  +input/inputset=[230707_meth,230619_unmeth] input.min_qscore="0,3,7" splitting.n_per_label="1000,10000,100000" --multirun

Testing

To run inference on an outside dataset, you can run python -m ont_hbdx.test. Set the datainput the same way as in training (no sampling, splitting will be done, even if configured). Further you give an experiment with the e config flag. Go on a comet.ml and find the experiment you want to load, then copy the experiment id (the long string of letters and numbers at the end of the URL) and pass it to the e flag. You can have multiple ones seperated by comma. right now the models are not saved online, so it only works with checkpoints that where run in the same folder.

python -m ont-hbdx.test  +input/inputset=[clinical_samples_1,****clinical_samples_2] e=?????

Citation

Nanopore based RNA methylation profiling of a circulating lung cancer biomarker

Marta Sanchez-Delgado 1, Maurice Frank 1, Tomáš Šišmiš 2, Mustafa Kahraman 1, Alberto Daniel-Moreno 1, Emmika Mummery 1, Jessika Ceiler 1, Jasmin Skottke 1, Carla Bieg-Salazar 1, Franziska Hinkfoth 1, Christina Rudolf 1, Ronja Weiblen 1, Kaja Tikk 1, Tobias Sikosek 1, Bruno R Steinkraus 1, Rastislav Horos 1, Michal Urda 2, Timothy Rajakumar 1

1 Hummingbird Diagnostics GmbH, Heidelberg, Germany

2 Department of Pneumology and Phtiseology, University Hospital and Polyclinic F.D. Roosevelt Banská Bystrica, Banská Bystrica, Slovakia

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
ont_hbdx		ont_hbdx
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nano-smallRNAseq

Installation

Comet

Environment

Training

Configuration

Run training

Testing

Citation

About

Releases

Packages

Languages

License

gitHBDX/Nano-smallRNAseq

Folders and files

Latest commit

History

Repository files navigation

Nano-smallRNAseq

Installation

Comet

Environment

Training

Configuration

Run training

Testing

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages