Skip to content

charlottewright/lep_busco_painter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lep busco painter

Paints chromosomes of lepidopteran genomes with BUSCOs.

Installation

conda env create -n buscopaint python=3.9 
conda activate buscopaint
conda install samtools 
conda install -c conda-forge r-base
conda install -c r r-tidyverse
conda install -c bioconda r-optparse

Running the scripts

1. Assign each BUSCO to a chromosome

buscopainter.py takes the full_table.tsv output file generated by BUSCOs for a "reference" genome and a "query" genome, along with an optional prefix (specified with -p, default "buscopainter") snf assigns each BUSCO to a chromosome and states whether it belongs to the dominant group of BUSCOs per chromosome ('self') or not.

buscopainter.py -r test_data/ilAglIoxx1_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv
buscopainter.py -r test_data/Merian_elements_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv

It will write three TSV files:

  • [PREFIX]_complete_summary.tsv which contains a summary of the chromosomal assignments
  • [PREFIX]_complete_location.tsv which contains the location and status of all shared complete BUSCOs.
  • [PREFIX]_duplicated_location.tsv which contains the location and status of all duplicated BUSCOs.

2. Plotting

The [PREFIX]_location.tsv files can be plotted using plot_buscopainter.R. This plots the chromosomes of the query genome as rectangles and paints the positions of complete/duplicated BUSCOs as lines which are coloured by their assigned chromosome in the reference genome. This script has one required argument - thelocation.tsv file. Optional arguments are:

  • Plot title (-p)
  • Index file (-i) - enables chromosomes to be drawn to size (rather than based on the last orthologs position)
  • Merian element mode (-m True) - paint chromosomes with Merian elements rather than query genome orthologs
  • Only plot differences mode (-d True) - only paint orthologs which do not belong to the dominant chromosome based on the reference
  • Custom threshold of orthologs (-n) - minimum number of orthologs on a given query chromosome for it to be displayed (this helps to filter out unplaced scaffolds). Default is >=3 orthologs.
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' 
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' -i ilAglIoxx1.fai -m True -d True

Full usage:

Options:
	-f CHARACTER, --file=CHARACTER
		location.tsv file

	-p CHARACTER, --prefix=CHARACTER
		prefix for plot title

	-i CHARACTER, --index=CHARACTER
		genome index file

	-m CHARACTER, --merians=CHARACTER
		use this flag if you are comparing a genome to Merian elements

	-d CHARACTER, --differences=CHARACTER
		only colour orthologs that have moved from the dominant chromosome

	-n NUMBER, --minimum=NUMBER
		minimum number of orthologs 

	-h, --help
		Show this help message and exit

NB: the index file can be generated via samtools faidx fasta.

Example output

Comparison of two genomes - painting all shared single-copy orthologs.

Comparison of one genome to Merian elements - painting only single-copy orthologs that have moved relative to Merian elements.

About

Paint chromosomes with BUSCOs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published