A tool to assign ancestral linkage units and/or identify fusion/fission events in Lepidopteran chromosomes based on a set of reference BUSCO genes as markers.
lep_fusion_fission_finder.py
takes the full_table.tsv output file for two species, along with an optional prefix (specified with -f, default "fsf"). The default window size for lepidoptera is 17 BUSCOs but this can be changed with the -w
flag e.g.:
python3 lep_fusion_split_finder.py -q test_data/Aglais_io_full_table.tsv -r test_data/Melitaea_cinxia_full_table.tsv -f Aglais-w 17`
This will write three files:
-
Aglais_chromosome_assignments.tsv
: a summary of the assignments for each scaffold in the query genome. For fused/fission chromosomes, their putative origins are listed. -
Aglais_warnings.tsv
: list of warnings - lists any contigs with under the threshold of BUSCOs specified (default: 17). Also records the number linkage units found if not 31 as expected. Also records total number of units if not the expected 31. -
Aglais_fusion_positions.tsv
: for each chromosome that is inferred to be a product of fusion, the start and end position of each ancestral block is reported.
Full usage:
usage: lep_fusion_fission_finder.py [-h] -r REFERENCE_TABLE -q QUERY_TABLE [-f PREFIX] [-w WINDOW_SIZE]
optional arguments:
-h, --help show this help message and exit
-r REFERENCE_TABLE, --reference_table REFERENCE_TABLE
full_table.tsv file for reference species
-q QUERY_TABLE, --query_table QUERY_TABLE
full_table.tsv for query species
-f PREFIX, --prefix PREFIX
Prefix for all output files
-w WINDOW_SIZE, --window_size WINDOW_SIZE
Number of BUSCOs to be used per window (must be odd)
map_fusion_fissions.py
takes the output of fusion_split_finder.py
and infers where fusion/fission occured in a given tree.
./map_fusion_fissions.py -i chr_assignments/ -tree spp.treefile -t 1 -o output/ -f test_run
This will a file called mapped_fusions_fissions.tsv
which contains a list of each fusion/fission event
Full uage:
usage: map_fusion_fissions_client.py [-h] -i INPUT_DATA [-tree TREE] -o OUTPUT [-f PREFIX] [-t THRESHOLD] [-l LABEL_STATUS]
optional arguments:
-h, --help show this help message and exit
-i INPUT_DATA, --input_data INPUT_DATA
path to lep_fusion_fission_finder output
-tree TREE, --tree TREE
Phylogenetic tree
-o OUTPUT, --output OUTPUT
output location relative to working directory
-f PREFIX, --prefix PREFIX
Prefix for all output files
-t THRESHOLD, --threshold THRESHOLD
Threshold for rearrangement to be shared between tips
-l LABEL_STATUS, --label_status LABEL_STATUS
Specify if tree already contains internal node labels
adjust_coordinates_of_fusions.py
takes the tsv file containing the fusion coordinates (generated by lep_fusion_fission_finder.py
) and adjusts the final portion of each fusion chromosome such that the reported coordinate reflects the end of the chromosome (i.e. the chromosome length) rather than the position of the last detected ortholog. This produces an adjusted tsv that can be used for downstream exploration of fusions.
./adjust_coordinates_of_fusions.py --help
usage: adjust_coordinates_of_fusions.py [-h] -f FUSIONS -i INDEX -p PREFIX
optional arguments:
-h, --help show this help message and exit
-f FUSIONS, --fusions FUSIONS
Fusion position output from LFSF
-i INDEX, --index INDEX
index file for genome
-p PREFIX, --prefix PREFIX
prefix for output file