Skip to content

A method for measuring allele-specific telomere length and characterizing telomere variant repeat sequences from long reads.


Notifications You must be signed in to change notification settings


Repository files navigation


A method for measuring allele-specific TL and characterizing telomere variant repeat (TVR) sequences from long reads.

If this software has been useful for your work, please cite us at:

Stephens, Z., & Kocher, J. P. (2024). Characterization of telomere variant repeats using long reads enables allele-specific telomere length estimation. BMC bioinformatics, 25(1), 194.


Telogator2 dependencies can be easily installed via conda:

# create conda environment
conda env create -f conda_env_telogator2.yaml

# activate environment
conda activate telogator2

Running Telogator2:

python -i input.fq \ 
                     -o results/ \ 
                     --minimap2 /path/to/minimap2

-i accepts fa, fa.gz, fq, fq.gz, or bam (multiple can be provided, e.g. -i reads1.fa reads2.fa). For Revio reads sequenced with SMRTLink13 and onward, we advise including both the "hifi" BAM and "fail" BAM as input to Telogator2.

An aligner executable must be specified, via either --minimap2, --winnowmap, or --pbmm2.

Recommended settings

Sequencing platforms have different sequencing error types, as such we recommend running Telogator2 with different options based on which platform was used:

PacBio Revio HiFi (30x) - -r hifi -n 4
PacBio Sequel II (10x) - -r hifi -n 3
Nanopore R10 (30x) - -r ont -n 4

For Nanopore reads generated using telomere enrichment methods, such as those described by Karimian et al., we recommend using -r ont -n 5 -tt 0.100 --collapse-hom 1000.

Telogator2 may be unable to analyze older Nanopore data, as reads basecalled with Guppy have prohibitively high sequencing error rates in telomere regions.

Test data

Telomere reads for HG002 can be found in the test_data/ directory. These are full-sized datasets and may take several hours to run.

HiFi reads (~70x): hg002-telreads_pacbio.fa.gz
ONT reads  (~25x): hg002-telreads_ont.fa.gz

A smaller dataset is also provided, which should take no more than a couple minutes to complete:

python -i test_data/hg002-ont-1p.fa.gz \ 
                     -o results/ \ 
                     -r ont

Output files

The primary output files are:

  • tlens_by_allele.tsv allele-specific telomere lengths
  • all_final_alleles.png plots of all alleles (TVR + telomere regions)
  • violin_atl.png violin plot of ATLs at each chromosome arm

The main results are in tlens_by_allele.tsv, which has the following columns:

  • chr anchor chromosome arm
    • subtelomeres that could not be aligned are labeled chrU for 'unmapped'
  • position anchor coordinate
  • ref_samp the specific T2T reference contig to which the subtelomere was aligned
  • allele_id ID number for this specific allele
    • ids ending in i indicate subtelomeres that were aligned to known interstitial telomere regions. These alleles should likely be excluded from subsequent analyses.
  • TL_p75 ATL (reports 75th percentile by default)
  • read_TLs ATL of each supporting read in the cluster
  • read_lengths length of each read in the cluster
  • read_mapq mapping quality of each read in the cluster
  • tvr_len length of the cluster's TVR region
  • tvr_consensus consensus TVR region sequence
  • supporting_reads readnames of each read in the cluster

Telogator reference

The reference sequence used for telomere anchoring currently contains the first and last 500kb of sequences from the following T2T assemblies:

More will be added as they become available.

Mouse reference

Experimental support has been added for the T2T-mouse genome:

python -i input.fq \ 
                     -o results/ \ 
                     -t resources/telogator-ref-mouse.fa.gz \ 
                     --minimap2 /path/to/minimap2

Note that if you choose winnowmap as the aligner that you will also need to add --winnowmap-k15 resources/telogator-ref-mouse-k15.txt.


A method for measuring allele-specific telomere length and characterizing telomere variant repeat sequences from long reads.








No releases published


No packages published
