Skip to content

This is a repo for a project involving the building of a phylogenetic tree for extant and recently extinct canids using DNA, morphological data and fossil occurrences. This project will start with a simpler version as a class project for EEOB563 at Iowa State University.

Notifications You must be signed in to change notification settings

brpetrucci/canid_tree

Repository files navigation

Resolution of basal wolf-like canid divergenc eusing ancient DNA and fossil data

This project is an abridged version of a future, more involved project looking to resolve uncertain details in the wolf-like canid phylogeny through combined-evidence analysis using DNA, fossil, and morphological data under a Bayesian framework. The description below is currently intended to be for the simpler version of this project intended as a final project for EEOB563 - Molecular Phylogenetics, a graduate class at Iowa State University. More details, be it in data filtering, model building, model selection and others, will be considered in the future. See misc/final.pdf for a detailed explanation of the question, methods and results for the analysis undertaken for completion of EEOB563. A brief description of each part of the repository follows.

The main directory contains the .gitignore and this README.md file, and all the .sh files, batch scripts written to run the analysis scripts on the HPC-class cluster. This includes model averaging for nuclear and morphological data, nuclear- and morphological- only analysis, and combined-evidence analysis.

misc/ contains the proposal, draft, and final version of the project report required in EEOB563.

data/ contains all the data used throughout the conception and execution of the project. taxa.tsv contains a list of fossil occurrences with fossil ages, while taxa_clean.tsv is the same but only containing one entry for each species and its minimum and maximum age. nuclear.nex contains 583k bases of SNP data for 10 species (9 canina and 1 fox), while nuclear_full.nex contains 621k bases of SNP data for 16 specimens spanning 12 species (10 canina and 2 foxes). morpho.nex is the full morphological matrix including fossil occurrences (which received the same morphological scoring as their species name has in the original matrix), and morpho_clean.nex is the morphological matrix used for analysis with the taxa in taxa_clean.tsv.

scripts/ contains all the scripts used in the analysis, including those that did not make it into the final report. RB_bug_avg_nuclear_MCMC.Rev is a script which leads to a NAN likelihood when using a p_inv parameter for dnPhyloCTMC, kept there for easy access for the RB development team. avg_morpho_setup.Rev sets up some parameters for the morphological model averaging analysis, while avg_morpho_MCMC.Rev runs the analysis for a given value of k, the number of states in a set of characters. avg_nuclear_MCMC.Rev runs the molecular model averaging analysis. morpho_MCMC.Rev runs the morphological-only analysis for binary characters, and nuclear_MCMC.Rev runs the nuclear-only analysis (currently using the full data set). combined_evidence_fbd.Rev runs combined-evidence analysis for the full data set with all fossil occurrences, while combined_evidence_fbd_clean.Rev runs it only for the min-max ages data set (and has many other updates that it accumulated throughout the projects).

output/ contains the output from RevBayes analyses and post-hoc tree summaries, and is a bit of a mess. All prev_ directories are simply backups from previous analyses, except for prev_output_nuclear which contains the output from the last nuclear analysis with just 10 species. output_combined and output_combined1 etc. contain output from 4 samples of 300k generations, which I did to be able to get a bigger sample size without running for more than the maximum time HPC-class allows. output_combined_asc is an intermediary output and not relevant for the EEOB563 project. output_morpho, output_morpho_avg, and the corresponding ones for nuclear, all are outputs of the latest version of their corresponding .Rev scripts.

I should acknowledge Rachel Rompala and Mihir Kharate, who gave me valuable feedback during the peer review part of this project.

About

This is a repo for a project involving the building of a phylogenetic tree for extant and recently extinct canids using DNA, morphological data and fossil occurrences. This project will start with a simpler version as a class project for EEOB563 at Iowa State University.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages