The included scripts were applied to the PoolSNP and SNAPE VCF files of the DESTv2 project (https://dest.bio/), the filtered markers correspond with inversion_markers_v6.txt. Average fequencies of these inversion specific marker SNPs were determined for each SNP calling method.
The calculated inversion frequencies for SNAPE can be found here and the results for PoolSNP here.
The following Analyses were performed:
Correlation of average frequencies for both methods were investigated for correlation.
For inverions In(3R)Payne, In(2L)t, In(2R)Ns, In(3R)C, In(3R)K, In(3R)Mo and In(3L)P several analysis of relationship between latitude and inversion frequencies were performed.
Preliminary result for In(3R)Payne from the PoolSNP dataset:
Statistic Analysis on Latitudinal Clines
Investigations were performed on North American sampels and European samples, to get insight into consistency of allele frequencies across years.
Preliminary result for Europe (PoolSNP dataset):
Make sure to have the following requirements installed and running:
- Python3
- R
- GNU parallel (https://www.gnu.org/software/parallel/)
Individual scripts to perform these analyses can be found in the scripts directory, the whole workflow including commands to call these scripts is documentated in the main.sh file. Please note that this is not a fully automated pipeline, consequently paths and directories for intermediate files and results need to be created and changed during this analysis manually. Additionally, scripts need to be called individually for SNAPE and PoolSNP data sets.
- Download DEST data (http://berglandlab.uvadcos.io)
- Convert VCF to SYNC
- Get count at inversion specific marker SNPs
- Calculate average frequencies for marker SNPs
- Plot correlation of inversion markers present in PoolSNP & SNAPE
- Plot marker frequencies for each inversion occuring in the data set
Type3 ANOVA was performed for latitudinal cline effects for all inversions with the "car" RPackage on the PoolSNP data set. The summary for the ANOVA can be found here.
- Get the average coverage of all markers for each population as one vector in a CSV. SubsampleSyncCov
- Get the total coverage of each marker across all populations as one vector in a CSV. CovPerMarker