-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e6e0ab4
commit 2530792
Showing
1 changed file
with
64 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,32 +1,78 @@ | ||
# Teloscope | ||
A telomere annotation tool. | ||
Teloscope | ||
============ | ||
|
||
Teloscope is a fast and comprehensive tool for matching, counting, and reporting telomeric repeats from genome assemblies (.fa) or (.fa.gz). Teloscope outputs telomere block annotations in BED files. It also allows the calculation of metrics such as: | ||
Introduction | ||
------------ | ||
Teloscope is a universal telomere annotation tool. It comprehensively runs matching, counting, and reporting telomeric repeats from genome assemblies (.fa) or (.fa.gz). Teloscope reports all these metrics in BED/BEDgraph files and produces a summary report. To install `teloscope`, use: | ||
```sh | ||
git clone https://github.com/vgl-hub/teloscope.git --recursive; | ||
cd teloscope; | ||
make -j | ||
``` | ||
|
||
* GC | ||
* Shannon Entropy | ||
Usage | ||
------------ | ||
teloscope -f input.[fa][.gz] -o [output/dir] -j [threads] -c [canonical] -p [patterns] -w [window size] -s [step size] -d [max-block-dist] -l [min-block-len] -k | ||
|
||
Teloscope reports all these metrics in BED/BEDgraph files and produces a summary report. | ||
**Note:** Teloscope automatically explores the input repeats and their reverse complements. If none are provided, it will scan for the canonical CCCTAA/TTAGGG repeats. | ||
|
||
Examples | ||
------------ | ||
* Example: | ||
|
||
teloscope -f "${file}" -o "${out_path}" -c TTAGGG -p TTAGGG,TCAGGG,TGAGGG,TTGGGG -w 2000 -s 1000 -k | ||
|
||
## Installation | ||
* Example: | ||
|
||
Either download one of the releases or `git clone https://github.com/vgl-hub/teloscope.git --recursive` and `make -j` in `teloscope` folder. | ||
teloscope -f "${file}" -o "${out_path}" -j 16 -c TTAGGG -p NNNGGG -w 1000 -s 500 -d 200 -l 1000 -k --verbose | ||
|
||
## Usage | ||
* Example: | ||
|
||
`teloscope -f input.[fasta][.gz] -o [output/dir] \ | ||
-c TTAGGG -p TTAGGG,TCAGGG,TGAGGG \ | ||
-w [window size] -s [step size] -d 100 -l 50 -k` | ||
teloscope -f "${file}" -o "${out_path}" -j 16 -c TTAGGG -p TBAGGG,TTRGGG,YTAGGG -w 2000 -s 1000 -d 200 -l 1000 -k --verbose | ||
|
||
**Note:** Teloscope accepts nucleotides in IUPAC format and generates all possible pattern combinations. | ||
|
||
Parameters | ||
------------ | ||
|
||
To check out all options and flags, please use: | ||
`teloscope -h` | ||
`teloscope -h` | ||
|
||
**Note:** Teloscope automatically explores the input repeats and their reverse complements. If none are provided, it will scan for the canonical CCCTAA/TTAGGG repeats. | ||
``` | ||
Required Parameters: | ||
'-f' --input-sequence Initiate tool with fasta/fasta.gz file. | ||
'-o' --output Set output route. | ||
'-c' --canonical Set canonical pattern. [Default: TTAGGG] | ||
'-p' --patterns Set patterns to explore, separate them by commas [Default: TTAGGG] | ||
'-w' --window Set sliding window size. [Default: 1000] | ||
'-s' --step Set sliding window step. [Default: 500] | ||
'-j' --threads Set the maximum number of threads. [Default: max. available] | ||
'-l' --min-block-length Set minimum block length for evaluation. [Default: 2000] | ||
'-d' --max-block-distance Set maximum block distance for merging. [Default: 200] | ||
## Description | ||
Optional Parameters: | ||
'-m' --mode Set analysis modes, separate them by commas. [Options: all,match,gc,entropy] | ||
'-k' --keep-window-data Keep window data for analysis, memory-intensive. [Default: false] | ||
'-v' --version Print current software version. | ||
'-h' --help Print current software options. | ||
--verbose verbose output. | ||
``` | ||
|
||
Briefly, **Teloscope** reads an assembly and decomposes its parts. It uses prefix trees and sliding windows to match and count telomeric repeats efficiently. It also analyzes the informational properties of the assembly to distinguish canonical and non-canonical telomere repeats. | ||
Outputs | ||
------------ | ||
Teloscope outputs telomere annotations in BED files. All the outputs are: | ||
* `telomere_blocks_all.bed` Annotation of the full telomere in the assembly. This is made of canonical and non-canonical repeats. | ||
* `telomere_blocks_canonical.bed` Blocks of adjacent canonical repeat matches. Outside of the ends, it represents interstitial telomeres (ITSs). | ||
* `window_metrics.tsv` Tabulated file with calculated window metrics such as GC% and Shannon Entropy | ||
* `window_repeats.bedgraph` File with canonical repeats, non-canonical repeats, canonical densities, and non-canonical densities by window. | ||
* `canonical_matches.bed` Coordinates of canonical repeats throughout the assembly. | ||
* `noncanonical_matches.bed` Coordinates of non-canonical repeats in terminal regions of contigs. | ||
|
||
## How to cite | ||
How it works | ||
------------ | ||
Briefly, **Teloscope** reads an assembly and decomposes its parts. It uses prefix trees and sliding windows to efficiently perform multiple string matching and counting of telomeric repeats. It analyzes the informational properties of the sliding windows to find telomeric blocks. These blocks are collected, post-processed, and filtered according to their positional and conformational properties. | ||
|
||
If you use **Teloscope**, please, cite this repository. | ||
How to cite | ||
------------ | ||
If you use **Teloscope** in your research, please, cite this repository. | ||
https://github.com/vgl-hub/teloscope/ |