Skip to content

Latest commit

 

History

History
84 lines (62 loc) · 3.92 KB

README.md

File metadata and controls

84 lines (62 loc) · 3.92 KB

Lep fusion fission finder

A tool to assign ancestral linkage units and/or identify fusion/fission events in Lepidopteran chromosomes based on a set of reference BUSCO genes as markers.

Running the scripts

1.) Find fusions/fissions

lep_fusion_fission_finder.py takes the full_table.tsv output file for two species, along with an optional prefix (specified with -f, default "fsf"). The default window size for lepidoptera is 17 BUSCOs but this can be changed with the -w flag e.g.:

python3 lep_fusion_split_finder.py -q test_data/Aglais_io_full_table.tsv -r test_data/Melitaea_cinxia_full_table.tsv -f Aglais-w 17`

This will write three files:

  • Aglais_chromosome_assignments.tsv: a summary of the assignments for each scaffold in the query genome. For fused/fission chromosomes, their putative origins are listed.

  • Aglais_warnings.tsv: list of warnings - lists any contigs with under the threshold of BUSCOs specified (default: 17). Also records the number linkage units found if not 31 as expected. Also records total number of units if not the expected 31.

  • Aglais_fusion_positions.tsv: for each chromosome that is inferred to be a product of fusion, the start and end position of each ancestral block is reported.

Full usage:

usage: lep_fusion_fission_finder.py [-h] -r REFERENCE_TABLE -q QUERY_TABLE [-f PREFIX] [-w WINDOW_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  -r REFERENCE_TABLE, --reference_table REFERENCE_TABLE
                        full_table.tsv file for reference species
  -q QUERY_TABLE, --query_table QUERY_TABLE
                        full_table.tsv for query species
  -f PREFIX, --prefix PREFIX
                        Prefix for all output files
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        Number of BUSCOs to be used per window (must be odd)

2.) Place fusions/fissions in a phylogenetic context

map_fusion_fissions.py takes the output of fusion_split_finder.py and infers where fusion/fission occured in a given tree.

./map_fusion_fissions.py -i chr_assignments/ -tree spp.treefile -t 1 -o output/ -f test_run

This will a file called mapped_fusions_fissions.tsv which contains a list of each fusion/fission event

Full uage:

usage: map_fusion_fissions_client.py [-h] -i INPUT_DATA [-tree TREE] -o OUTPUT [-f PREFIX] [-t THRESHOLD] [-l LABEL_STATUS]
optional arguments:
 -h, --help            show this help message and exit
 -i INPUT_DATA, --input_data INPUT_DATA
                       path to lep_fusion_fission_finder output
 -tree TREE, --tree TREE
                       Phylogenetic tree
 -o OUTPUT, --output OUTPUT
                       output location relative to working directory
 -f PREFIX, --prefix PREFIX
                       Prefix for all output files
 -t THRESHOLD, --threshold THRESHOLD
                       Threshold for rearrangement to be shared between tips
 -l LABEL_STATUS, --label_status LABEL_STATUS
                       Specify if tree already contains internal node labels

Additional scripts:

adjust_coordinates_of_fusions.py takes the tsv file containing the fusion coordinates (generated by lep_fusion_fission_finder.py) and adjusts the final portion of each fusion chromosome such that the reported coordinate reflects the end of the chromosome (i.e. the chromosome length) rather than the position of the last detected ortholog. This produces an adjusted tsv that can be used for downstream exploration of fusions.

./adjust_coordinates_of_fusions.py  --help
usage: adjust_coordinates_of_fusions.py [-h] -f FUSIONS -i INDEX -p PREFIX

optional arguments:
  -h, --help            show this help message and exit
  -f FUSIONS, --fusions FUSIONS
                        Fusion position output from LFSF
  -i INDEX, --index INDEX
                        index file for genome
  -p PREFIX, --prefix PREFIX
                        prefix for output file