Paints chromosomes of lepidopteran genomes with BUSCOs.
conda env create -n buscopaint python=3.9
conda activate buscopaint
conda install samtools
conda install -c conda-forge r-base
conda install -c r r-tidyverse
conda install -c bioconda r-optparse
1. Assign each BUSCO to a chromosome
buscopainter.py takes the full_table.tsv output file generated by BUSCOs for a "reference" genome and a "query" genome, along with an optional prefix (specified with -p, default "buscopainter") snf assigns each BUSCO to a chromosome and states whether it belongs to the dominant group of BUSCOs per chromosome ('self') or not.
buscopainter.py -r test_data/ilAglIoxx1_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv
buscopainter.py -r test_data/Merian_elements_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv
It will write three TSV files:
[PREFIX]_complete_summary.tsv
which contains a summary of the chromosomal assignments[PREFIX]_complete_location.tsv
which contains the location and status of all shared complete BUSCOs.[PREFIX]_duplicated_location.tsv
which contains the location and status of all duplicated BUSCOs.
2. Plotting
The [PREFIX]_location.tsv
files can be plotted using plot_buscopainter.R
. This plots the chromosomes of the query genome as rectangles and paints the positions of complete/duplicated BUSCOs as lines which are coloured by their assigned chromosome in the reference genome. This script has one required argument - thelocation.tsv
file. Optional arguments are:
- Plot title (
-p
) - Index file (
-i
) - enables chromosomes to be drawn to size (rather than based on the last orthologs position) - Merian element mode (
-m True
) - paint chromosomes with Merian elements rather than query genome orthologs - Only plot differences mode (
-d True
) - only paint orthologs which do not belong to the dominant chromosome based on the reference - Custom threshold of orthologs (
-n
) - minimum number of orthologs on a given query chromosome for it to be displayed (this helps to filter out unplaced scaffolds). Default is >=3 orthologs.
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1'
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' -i ilAglIoxx1.fai -m True -d True
Full usage:
Options:
-f CHARACTER, --file=CHARACTER
location.tsv file
-p CHARACTER, --prefix=CHARACTER
prefix for plot title
-i CHARACTER, --index=CHARACTER
genome index file
-m CHARACTER, --merians=CHARACTER
use this flag if you are comparing a genome to Merian elements
-d CHARACTER, --differences=CHARACTER
only colour orthologs that have moved from the dominant chromosome
-n NUMBER, --minimum=NUMBER
minimum number of orthologs
-h, --help
Show this help message and exit
NB: the index file can be generated via samtools faidx fasta
.
Comparison of two genomes - painting all shared single-copy orthologs.
Comparison of one genome to Merian elements - painting only single-copy orthologs that have moved relative to Merian elements.