Skip to content

Marcos-Fernando/AnnoTEP

Repository files navigation

Logo2

Linux version Python Perl JavaScript R Install License

Table of contents


Introduction

The AnnoTEP is a platform designed for the annotation of transposable elements (TEs) in plant genomes. Built upon the EDTA pipeline, the tool incorporates specific modifications inspired by the Plant genome Annotation pipeline, as well as adjustments that enhance its performance and flexibility. One of the key differentiators of AnnoTEP is its graphical user interface (GUI), developed using HTML and Python technologies, which makes the process accessible even to researchers with limited familiarity with command-line operations. Combining efficiency, customisation, and ease of use, AnnoTEP provides a robust solution for the analysis and annotation of TEs. Additionally, the tool has proven effective in the analysis of algae and microalgae, contributing to significant advancements in genomic research.

In addition to its GitHub repository, AnnoTEP also has a website that centralises its documentation, displays the genome mutation rate table, and showcases a selection of pre-processed genomes using the tool.

Functions of AnnoTEP

  • Enhancement in the detection of LTRs, LINEs, TIRs, and Helitrons.
  • Improved identification and classification of non-autonomous LTRs, such as TRIM, LARD, TR-GAG, and BARE2.
  • Detection of solo LTRs.
  • Classification of lineages belonging to the Copia and Gypsy superfamilies.
  • Classification of Helitrons into autonomous and non-autonomous.
  • Optimisation of repetitive sequence masking.
  • Generation of transposable element (TE) classification reports.
  • Creation of repeat landscape plots, histograms, and phylogenetic trees of LTR lineages.

Installing and configuring environments

AnnoTEP can be installed in different ways, depending on your preferences and needs. In this tutorial, we will guide you through two main installation methods: the traditional method and installation via Docker. Both methods are detailed to ensure a smooth and efficient setup on your machine.

Installing with library and conda

Note

Prerequisites

Important

System requirements
Minimum requirements for both versions for Genomes up to 1GB

  • Threads: 20
  • RAM: 50GB
  • Storage: 1TB

More resources are recommended for larger genomes.

Tip

MiniConda Installation
After downloading Miniconda from the link above, run the following command in your terminal:

bash Miniconda3-latest-Linux-x86_64.sh

Configuring the repository

πŸ“š Installing Required Libraries

Step 1. Install the necessary libraries by running the following commands in your terminal:

sudo apt-get install libgdal-dev lib32z1 python-is-python3 python3-setuptools python3-biopython python3-xopen trf hmmer2 seqtk libtext-soundex-perl
sudo apt-get install hmmer emboss python3-virtualenv cd-hit iqtree build-essential linux-generic libmpich-dev libopenmpi-dev bedtools pullseq bioperl
sudo apt-get install pdf2svg

# R dependencies
sudo apt-get install r-cran-ggplot2 r-cran-tidyr r-cran-reshape2 r-cran-reshape rs r-cran-viridis r-cran-tidyverse r-cran-gridextra r-cran-gdtools r-cran-phangorn r-cran-phytools r-cran-ggrepel

Access the R program from the terminal and install libraries from within it:

R

install.packages("hrbrthemes")

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ggtree")
BiocManager::install("ggtreeExtra")

Tip

Alternative Method: If you encounter errors with BiocManager, ggtree, or ggtreeExtra, use the following approach:

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("YuLab-SMU/ggtree")
devtools::install_github("YuLab-SMU/ggtreeExtra")

Step 2. Copy the break_fasta.pl scripts to /usr/local/bin:

sudo cp Scripts/break_fasta.pl /usr/local/bin

βš™οΈ Configuring modified EDTA

πŸ“Œ AnnoTEP uses the same installation method as EDTA. To set up the environment, navigate to the AnnoTEP folder and follow the steps below:

# Navigate to the EDTA directory
cd EDTA

conda env create -f EDTA_2.2.x.yml
conda activate EDTA2
perl EDTA.pl -h

Important

πŸ“Œ FOR NVIDIA GPU SERVERS ONLY!!!!!
The TIR Learner in EDTA may may not work correctly on GPU servers. To resolve this, follow the instructions below to install EDTA correctly:

mamba create -n EDTA2.2 -c conda-forge -c bioconda -c r annosine2 biopython blast cd-hit coreutils genericrepeatfinder genometools-genometools glob2    h5py==3.9 keras==2.11 ltr_finder ltr_retriever mdust multiprocess muscle openjdk pandas perl perl-text-soundex pyarrow python r-base r-dplyr regex repeatmodeler r-ggplot2 r-here r-tidyr scikit-learn swifter tensorflow==2.11 tesorter

πŸ“Œ RepeatMasker Fixes for Long Names

During execution, you may encounter the following error: FastaDB::_cleanIndexAndCompact(): Fasta file contains a sequence identifier which is too long ( max id length = 50 )

To fix this issue, follow the steps below:
Step 1. Edit the RepeatMasker File

  • Access the RepeatMasker file installed in the Conda environment:

    /home/"user"/miniconda3/envs/EDTA/bin/RepeatMasker
  • Locate all occurrences of FastaDB where the following snippet appears:

    my $db = FastaDB->new(
                    fileName    => $file,
                    openMode    => SeqDBI::ReadWrite,
                    maxIDLength => 50
    );
  • Change the value of maxIDLength from 50 to a higher value, for example:

    my $db = FastaDB->new(
                    fileName    => $file,
                    openMode    => SeqDBI::ReadWrite,
                    maxIDLength => 80
     );

Step 2. Edit the ProcessRepeats File

  • Acess the ProcessRepeats file:

    /home/"user"/miniconda3/envs/EDTA/share/RepeatMasker/ProcessRepeats
  • Repeat the same procedure to change the value of maxIDLength to 80.

Testing

πŸ“₯ Downloading genomes

Step 1. You can choose to use your own data or download example genomes for testing:

🧬 Theobrama cacao

wget https://cocoa-genome-hub.southgreen.fr/sites/cocoa-genome-hub.southgreen.fr/files/download/Theobroma_cacao_pseudochromosome_v1.0_tot.fna.tar.gz
tar xvfz Theobroma_cacao_pseudochromosome_v1.0_tot.fna.tar.gz
mv Theobroma_cacao_pseudochromosome_v1.0_tot.fna Tcacao.fasta
rm Theobroma_cacao_pseudochromosome_v1.0_tot.fna.tar.gz

🧬 Arabidopsis thaliana

wget https://www.arabidopsis.org/download_files/Genes/TAIR10_genome_release/TAIR10_chromosome_files/TAIR10_chr_all.fas.gz
gzip -d TAIR10_chr_all.fas.gz
cat TAIR10_chr_all.fas | cut -f 1 -d" " > At.fasta
rm TAIR10_chr_all.fas

Tip

If you can't download Arabidopsis thaliana automatically, you can manually download it from tair, Click on TAIR10_chr_all.fas.gz and follow the commands above starting from the second line.

Step 2. Run EDTA on the downloaded genome:

cd EDTA
mkdir Athaliana
cd Athaliana

nohup "{absolute-path-to-folder-AnnoTEP}"/EDTA/EDTA.pl --genome "{absolute-path-to-folder-genome}"/At.fasta --species others --step all --sensitive 1 --anno 1 --threads 12 > EDTA.log 2>&1 &

Note

Replace {absolute path to the-AnnoTEP-folder} and {absolute path to the-genome-folder} with the appropriate path

Step 3. Monitor the progress of the EDTA run:

tail -f EDTA.log

Tip

πŸ“Œ Adjust the number of threads based on your computer or server's capacity. Set it to the maximum available. In the example above, it is set to 12.
πŸ“Œ For more accurate TE detection and annotation, enable the --sensitive 1. This activates RepeatModeler to identify remaining TEs and other repeats, and it also generates Superfamily and Lineage classifications for TEs.
πŸ“Œ To perform a more accurate analysis of the genome, we recommend enable the mutation rate -u float. The values and explanation are provided in the file LTR-Ages.doc.

Example of usage:

nohup "{absolute-path-to-folder-AnnoTEP}"/EDTA/EDTA.pl --genome "{absolute-path-to-folder-genome}"/At.fasta --species others --step all --sensitive 1 --anno 1 --threads 12 -u 7.0e-9 > EDTA.log 2>&1 & 

Note

Non-autonomous elements (e.g., non-autonomous LARDs and Helitrons) can carry passenger genes. For proper genome annotation, these elements must be partially masked. The modified EDTA pipeline handles this automatically and generates a softmasked genome sequence, available in the EDTA folder as $genome-Softmasked.fa .

Generating Graphs

Step 1. Create and activate the virtual environment: Before proceeding, disable the Conda environment if it is active to prevent dependency conflicts. Then, navigate to the AnnoTEP folder, create a Python virtual environment to ensure an isolated and proper setup, and finally, install the required dependencies by running the command below:

python -m venv .results
. .results/bin/activate

pip install -r requirements.txt 

Step 2. Run the processing script: Next, navigate to the folder created to store the annotated genome (e.g., Athaliana) and run the command below to generate new data and graphs from the input genome (e.g., At.fasta):

cd {absolute-path-to-folder-AnnoTEP}/EDTA/Athaliana
python -u {absolute-path-to-folder-AnnoTEP}/Scripts/process_graphic.py At.fasta

Tip

Make sure to replace At.fasta with the name of the input file you wish to process, if it is different.

At the end of the analysis, three main directories will be generated: TE-REPORT, LTR-AGE, and TREE, each containing detailed results and relevant visualisations. In the results section, each generated graph will be described in detail.

πŸ“Ž Return to Table of contents


Running AnnoTEP CLI in Alternative Ways

AnnoTEP CLI can also be executed using an alternative method. Follow the steps below to set up and run the CLI.

Step 1. Set Up the Virtual Environment: After installing the required libraries and the conda environment, navigate to the bash-interface directory within the AnnoTEP folder. Create a Python virtual environment and install the necessary libraries:

python -m venv .bashinterface

. .bashinterface/bin/activate
pip install -r requirements.txt 

Step 2. Run the AnnoTEP CLI Script: Once the installation is complete, execute the run_annotep.py script. You can check the available options using the -h flag:

python run_annotep.py -h
  • To run the script with your genome file, use the following command:
python run_annotep.py --genome "{absolute-path-to-folder-genomes}"/genome.fasta --threads number

Note

This script uses the same parameters as EDTA. The key difference is that it automatically generates the graphs without requiring additional commands.


Using AnnoTEP with Graphical User Interface

Important

Before proceeding, ensure that all required libraries and the conda environment have been installed.

Step 1. Set Up the Virtual Environment: Navigate to the graphic-interface folder within the AnnoTEP directory. Create a Python virtual environment and install the necessary libraries:

python -m venv .graphic

. .graphic/bin/activate
pip install -r requirements.txt 

Note

The requirements.txt file contains essential libraries, such as Flask and python-dotenv. If any package fails to install, you may need to install it manually.

Step 2. Configure the .flaskenv File: Create and configure a .flaskenv file. This file is essential for setting up Flask and enabling email functionality. Below is an example configuration:

FLASK_APP = "main.py"
FLASK_DEBUG = True
FLASK_ENV = development

MAIL_SERVER=server-email
MAIL_PORT=number
MAIL_USE_TLS=True or False
MAIL_USE_SSL=True or False
[email protected]
MAIL_PASSWORD=app*password*

Tip

Email Server Settings:

  • Gmail:
    • Server: smtp.gmail.com
    • Port: 587 (TLS) or 465 (SSL)
  • Outlook:
    • Server: smtp.office365.com
    • Port: 587 (TLS)

App Password for Gmail:
To generate an app password for Gmail, follow these steps:

  1. Go to your Google Account settings.
  2. Search for "App Passwords" in the search bar.
  3. Generate a new app password and use it in the MAIL_PASSWORD field.

Warning

Security Recommendations:

  • You do not need to use your primary Gmail account. You can create and use any email address for this method.
  • When sharing this configuration, never share the .flaskenv file or its contents, as it contains sensitive information.

Step 3. Run the Application: Inside the graphic-interface folder and with the virtual environment activated, start the application by running:

flask run

If all settings are correct, you will see a message similar to this:

 * Serving Flask app 'main.py' (lazy loading)
 * Environment: development
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 264-075-516

Step 4. Access the Platform: Click on the link http://127.0.0.1:5000/ or copy and paste it into your browser to access the platform and start testing it.

πŸ“Ž Return to Table of contents



Installing with Container

AnnoTEP can be installed on the machine in different ways, one of which is using Docker. The tool is available in two formats: with a graphical interface and without an interface (terminal mode). To follow the steps below, you need to have Docker installed on your machine. You can download it directly from the official Docker website

Graphic User Interface - GUI

Logo3

Important

For this version, your machine must have access to the internet.

Open the terminal and run the following commands:

Step 1. Download the AnnoTEP Image: Open your terminal and run the following command to download the AnnoTEP Docker image:

docker pull annotep/annotep-gui:v1

Step 2. Run the Container Next, run the container using the command below. Specify a folder on your machine to store the annotation results:

docker run -it -v "{folder-results}":/root/TEs/graphic-interface/results -dp 0.0.0.0:5000:5000 annotep/annotep-gui:v1

Tip

Description:

  • -v {folder-results}:/root/TEs/graphic-interface/results: Creates a volume between your machine and the container to store results. Replace -v {folder-results} with the path to a folder on your machine. If the folder doesn't exist, Docker will create it. The path /root/TEs/graphic-interface/results is the directory inside the container and should not be changed.
  • -dp 0.0.0.0:5000:5000: Maps port 5000 on the container to port 5000 on your machine.
  • annotep/annotep-gui:v1: Specifies the Docker image to use.

πŸ“Œ For testing, you can download the Arabidopsis thaliana (Chromosome 4) file AtChr4.fasta from the repository. **The annotation process may take approximately 1 hour if 10 threads are used**.

Example:

docker run -it -v /home/"user"/results-annotep:/root/TEs/graphic-interface/results -dp 0.0.0.0:5000:5000 annotep/annotep-gui:v1

Step 3. Acess the AnnoTEP interface: After running the container, access the AnnoTEP interface by typing the following address into your web browser:127.0.0.1:5000

Note

πŸ“Œ When you access 127.0.0.1:5000, you will see a version of the AnnoTEP platform similar to the web version.

πŸ“Œ For testing, you can download the Arabidopsis thaliana (Chromosome 4) file AtChr4.fasta from the repository. The annotation process may take approximately 1 hour if 10 threads are used.

πŸ“Œ This version includes a field to specify the number of threads. It is recommended to have at least 4 threads available on your machine. Note that fewer threads will result in longer analysis times.

Step 4. Submit Data for Analysis: In the graphical interface, input the required data, such as:

  • Email Address: notifications about the process status.
  • Genome: The genome file to be analysed.
  • Features: Choose the type of analysis to be performed.

When the process is completed without errors, you will receive an email informing you that the results are available in the specified results folder -v {folder-results}.

Step 5. Monitor Progress via Docker Logs: To monitor the annotation progress, use the Docker logs:.

  1. In the terminal, type
docker ps 
  1. A list of active containers will appear. Copy the CONTAINER ID of the AnnoTEP image.
  2. Use the following command to view the logs:
docker logs -f "CONTAINER ID"

Important

  • Avoid shutting down your machine during the process, as this may interrupt the analysis. Even when using the web interface, processing occurs locally on your machine.
  • Annotation speed depends on your machine's performance. Ensure your system meets the recommended requirements for optimal results.

πŸ“Ž Return to Table of contents

Command Line Interface - CLI

Logo4

While the primary focus of AnnoTEP is its user-friendly graphical interface, we also provide a Docker version designed exclusively for command-line use. This option caters to researchers who prefer or are more accustomed to working in a terminal environment. The configurable parameters in the Docker version closely mirror those offered by the EDTA pipeline, ensuring a consistent and flexible experience for diverse workflows.

Step 1. Download the AnnoTEP Image: To get started, download the AnnoTEP CLI Docker image by running the following command:

docker pull annotep/annotep-cli:v1

Step 2. Display the User Guide: Use the -h parameter to display a detailed guide on how to use the script:

docker run annotep/annotep-cli:v1 python run_annotep.py -h

This will show the following usage instructions:

usage: run_annotep.py [-h] --genome GENOME --threads THREADS
                      [--species {Rice,Maize,others}]
                      [--step {all,filter,final,anno}] [--sensitive {0,1}]
                      [--overwrite {0,1}] [--anno {0,1}] [--evaluate {0,1}]
                      [--force {0,1}] [--u U] [--maxdiv [0-100]] [--cds CDS]
                      [--curatedlib CURATEDLIB] [--exclude EXCLUDE]
                      [--rmlib RMLIB] [--rmout RMOUT]

Run annotep with specified parameters.

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  --genome GENOME       The genome FASTA file (.fasta)
  --threads THREADS     Number of threads used to complete annotation (default threads: 4)

optional arguments:
  --species {Rice,Maize,others}
                        Specify the species for identification of TIR candidates. Default: others
  --step {all,filter,final,anno}
                        Specify which steps you want to run EDTA.
  --sensitive {0,1}     Use RepeatModeler to identify remaining TEs (1) or not (0, default). This step may help to recover some TEs.
  --overwrite {0,1}     If previous raw TE results are found, decide to overwrite (1, rerun) or not (0, default).
  --anno {0,1}          Perform (1) or not perform (0, default) whole-genome TE annotation after TE library construction.
  --evaluate {0,1}      Evaluate (1) classification consistency of the TE annotation. (--anno 1 required).
  --force {0,1}         When no confident TE candidates are found: 0, interrupt and exit (default); 1, use rice TEs to continue.
  --u U                 Neutral mutation rate to calculate the age of intact LTR elements. Intact LTR age is found in this file: *EDTA_raw/LTR/*.pass.list. Default: 1.3e-8 (per bp per year, from rice).
  --maxdiv [0-100]      Maximum divergence (0-100, default: 40) of repeat fragments comparing to library sequences.
  --cds CDS             Provide a FASTA file containing the coding sequence (no introns, UTRs, nor TEs) of this genome or its close relative.
  --curatedlib CURATEDLIB
                        Provided a curated library to keep consistant naming and classification for known TEs. TEs in this file will be trusted 100%, so please ONLY provide MANUALLY CURATED ones. This option is not mandatory. It is totally OK if no file is provided (default).
  --exclude EXCLUDE     Exclude regions (bed format) from TE masking in the MAKER.masked output. Default: undef. (--anno 1 required).
  --rmlib RMLIB         Provide the RepeatModeler library containing classified TEs to enhance the sensitivity especially for LINEs. If no file is provided (default), EDTA will generate such file for you.
  --rmout RMOUT         Provide your own homology-based TE annotation instead of using the EDTA library for masking. File is in RepeatMasker .out format. This file will be merged with the structural-based TE annotation. (--anno 1 required). Default: use the EDTA library for annotation.

Step 3. Run the Container: To simplify this step, we recommend creating a folder to store your genomic data in FASTA format. Once created, run the container using the command below as a guide. Ensure you provide the full path to the folder where you want to save the results, as well as the full path to the genomes folder:

docker run -it -v "{folder-results}":/root/TEs/bash-interface/results -v "{absolute-path-to-folder-genomes}":"{absolute-path-to-folder-genomes}" annotep/annotep-cli:v1 python run_annotep.py --genome "{absolute-path-to-folder-genomes/genome.fasta}" --threads "{number}"

Tip

Description:

  • -v {folder-results}:/root/TEs/bash-interface/results: Creates a volume between your machine and the container to store results. Replace -v {folder-results} with the path to a folder on your machine. If the folder doesn't exist, Docker will create it. The path /root/TEs/www/results is the directory inside the container and should not be changed.
  • -v {absolute-path-to-folder-genomes}:{absolute-path-to-folder-genomes}: Creates a temporary copy of the genomic files inside Docker. Ensure you provide the correct path to the folder containing your genomes.
  • --genome {absolute-path-to-folder-genomes/genome.fasta}: Specify the full path to the genome file you want to annotate.
  • --threads {number}: Define the number of threads to use.

πŸ“Œ For testing, you can download the Arabidopsis thaliana (Chromosome 4) file AtChr4.fasta from the repository. **The annotation process may take approximately 1 hour if 10 threads are used**.

Example:

docker run -it -v /home/"user"/results-annotep:/root/TEs/bash-interface/results -v /home/"user"/Documents/TEs:/home/"user"/Documents/TEs annotep/annotep-cli:v1 python run_annotep.py --genome /home/"user"/TEs/AtChr4.fasta --threads 12 --sensitive 1 --anno 1

Step 4. Monitor the Annotation Process: Wait for the genome annotation to complete. You can monitor the progress directly through the terminal.


Important

Resolving Memory Issues in Docker Containers
If Docker containers experience memory issues or unexpected terminations due to intensive resource usage, you can adjust the process limits (--pids-limit) and swap memory (--memory-swap). Example usage:

docker run -it -v "{folder-results}":/root/TEs/graphic-interface/results -dp 0.0.0.0:5000:5000 --pids-limit "{threads x 10000}" --memory-swap -1 annotep/annotep-gui:v1

Explanation:

  • --pids-limit {threads x 10000}:Sets the maximum number of processes the container can create. For example, if you use 12 threads, set this value to 120,000. This ensures each thread can create subprocesses without hitting the process limit, maintaining performance under high load.
  • --memory-swap -1: Disables the swap memory limit, allowing the container to use unlimited virtual memory. This helps avoid errors when physical RAM is insufficient.

πŸ“Ž Return to Table of contents

Results

In addition to FASTA libraries, GFF3 files, and softmasking outputs, AnnoTEP also generates informative graphs and detailed reports based on the data obtained during the annotation process.

TE-REPORT

In the TE-REPORT directory, you will find a table that categorises TEs hierarchically by order, superfamily, and autonomy, along with metrics such as base pairs, size, and percentage. This data is visualised using bar charts and bubble charts.

πŸ“Œ TEs-Report-Complete.txt: A comprehensive table listing the classifications of TEs, including partial elements, which are labelled with the suffix β€œ-like” (e.g., Angela-like).

TEs-Complete

πŸ“Œ TEs-Report-Lite.txt: A simplified report derived from the complete version, containing concise and accessible information.

TEs-Lite

πŸ“Œ TE-Report*: These charts, generated from the TEs-Report-Lite.txt file, provide a clear and informative visualisation of TEs, categorised by hierarchical levels.

TE-Report1-length TE-Report2-length TE-Report1-number TE-Report2-number TE-Report1-bubble TE-Report2-bubble

πŸ“Œ RepeatLandScape.*: This graph provides a coherent and easily understandable inference of the relative ages of each repetitive element identified in a specific genome. The analysis is based on the genetic distance calculation proposed by Kimura, which estimates the time elapsed since duplication or insertion events of these elements.
By applying Kimura’s calculation, the graph distinguishes older elements (with greater accumulated divergence) from more recent ones (with lower divergence), offering valuable insights into the evolutionary dynamics and genomic history of the organism under study.

Repeat-Land-Scape

LTR-AGE

This directory LTR-AGE directory contains charts that estimate the ages of LTR Gypsy and LTR Copia elements:
πŸ“Œ AGE-Gypsy.* and AGE-Copia.*: The histogram displays the age distribution of LTR elements identified in the genome. The dashed vertical lines indicate the median age, while the horizontal line represents the mean, both expressed in million years (Mya). This visualisation provides a clear analysis of the dispersion of LTR ages, highlighting the central tendency and temporal variability of these elements.

AGE-Copia AGE-Gypsy

TREE

This TREE directory contains phylogenetic charts for the alignments of all LTR-RT domains:

πŸ“Œ LTR_RT-Tree1.*, LTR_RT-Tree3.*, LTR_RT-Tree4.*: These charts represent the phylogeny of lineage alignments within LTR superfamilies, providing a comprehensive visualisation of their evolutionary relationships. The phylogeny illustrates how different LTR-RT domains are related to each other based on their genetic sequences.

LTR-RT-Tree1 LTR-RT-Tree3 LTR-RT-Tree4

πŸ“Œ LTR_RT-Tree2.*: A circular chart where: - The outer circle (purple) represents the length (in base pairs) occupied by each element. - The inner circle (red) represents the number of occurrences of each element.

LTR-RT-Tree2

πŸ“Ž Return to Table of contents


List of genomes tested in this pipeline

AnnoTEP offers the capability to analyse a wide range of plants, algae, and microalgae that have not yet been explored or are underrepresented in previous studies. This approach enables the discovery of new TEs and genomic patterns that could be crucial for advancements in areas such as genomic evolution, species adaptation, and biotechnology. By focusing on less-studied genomes, AnnoTEP opens doors to groundbreaking research and contributes to filling gaps in the current understanding of TE diversity and functionality.

Genome Common Name Size
Amborella trichopoda (v1.0) Amborella 706,33 Mb
Ananas comosus (v1) Pineapple 381,91 Mb
Anthoceros angustus Hornwort 119,35 Mb
Arabidopsis lyrata (V2.1) Lyrate Rockcress 206,67 Mb
Arabidopsis thaliana (TAIR10) Thale cress 119,67 Mb
Azolla filiculoides Mosquito fern 622,59 Mb
Brachypodium distachyon (ABR2 v1) Stiff brome 271,43 Mb
Brassica oleracea capitata (v1.0) Cabbage 385,01 Mb
Carnegiea gigantea Saguaro 1,14 Gb
Ceratodon purpureus Moss 349,46 Mb
Chlamydomonas reinhardtii Green algae 114,63 Mb
Coffea arabica Arabian coffee 1,19 Gb
Conticribra weissflogii Diatoms 231,50 Mb
Cryptomonas gyropyrenoidosa Cryptomonads 0,74 Mb
Cycas panzhihuaensis Dukou sago palm 10,48 Gb
Cyanidia caldarium Red algae 8,79 Mb
Cyanidiococcus yangmingshanensis Red algae 12,02 Mb
Cyanophora paradoxa Freshwater Glaucophyte 99,94 Mb
Diacronema lutheri Haptophytes 43,50 Mb
Eucalyptus grandis (v2.0) Rose gum 691,35 Mb
Euglena gracilis Unicellular algae 2,37 Gb
Fragaria x ananassa (Royal Royce v1.0) Strawberries 786,54 Mb
Galdieria yellowstonensis Red algae 14,51 Mb
Ginkgo biloba Maidenhair trees 2,64 Gb
Gnetum montanum Joint fir 3,79 Gb
Gossypium hirsutum (v3.1) Cotton 2,28 Gb
Hevea brasiliensis Rubber tree 1,88 Gb
Isoetes taiwanensis Quillwort 1,66 Gb
Lotus japonicus Miyakogusa 553,71 Mb
Malpighia emarginata Acerola 1,03 Gb
Malus domestica (v1.1) Apple 709,56 Mb
Manihot esculenta (V8.1) Cassava 639,59 Mb
Marchantia polymorpha (v3.0) Common liverwort 225,76 Mb
Mimosa bimucronata MaricΓ‘ 640,55 Mb
Mimosa pudica Sensitive Plant 797,25 Mb
Musa acuminata (Pahang) Banana 484,06 Mb
Nelumbo nucifera Sacred lotus 821,29 Mb
Nepenthes gracilis Pitcher plant 752,88 Mb
Oryza sativa (v7.0) Rice 374,47 Mb
Passiflora edulis Passion fruit 1,34 Gb
Phaseolus vulgaris (v2.1) Common bean 537,22 Mb
Physcomitrium patens (6.1) Moss 481,75 Mb
Populus trichocarpa (v4.1) Black cottonwood 392,16 Mb
Prunus persica (v2.1) Peach 227,41 Mb
Psidium guajava Guava 443,76 Mb
Quercus rubra (v2.1) Northern red oak 739,58 Mb
Salix purpurea (5.1) Basket willow 329,29 Mb
Salvinia cucullata Small rat's ear 231,85 Mb
Saccharum officinarum x spontaneum R570 (v2.1) Sugarcane 5,05 Gb
Selaginella moellendorffii Spikemoss 212,32 Mb
Setaria viridis (v4.1) Green foxtail 397,28 Mb
Sherardia arvensis Field madder 441,30 Mb
Skeletonema tropicum Centric diatoms 78,78 Mb
Solanum lycopersicum (ITAG5.0) Tomato 801,81 Mb
Solanum tuberosum (v6.1) Potato 741,59 Mb
Sorghum bicolor (v5.1) Broomcorn 719,89 Mb
Theobroma cacao (v2.1) Cacao 341,71 Mb
Theobroma grandiflorum (C1074) Cupuassu 423,92 Mb
Utricularia gibba Floating bladderwort 100,69 Mb
Vitis vinifera (v2.1) Grape vine 486,20 Mb
Welwitschia mirabilis Tree Tumbo 6,87 Gb
Zea mays maize 2,14 Gb

Genomes Under Analysis

This section lists the genomes currently being analysed using the AnnoTEP pipeline. The results will be updated as the analysis progresses.

Genome Common Name Size
Ceratopteris richardii (v2.1) Fern 7,46 Gb
Helianthus annuus (r1.2) Sunflower 3,03 Gb
Pinus tabuliformis Chinese pine 24,41 Gb
Triticum aestivum cv. Chinese Spring (v2.1) bread wheat 14,58 Gb

πŸ“Ž Return to Table of contents


Citations

  • Comming Soon

Questions and Issues

  • Commig Soon