Skip to content

In this repository, scripts are collected that are used to detect different genetic variations (GV) in DNA-seq ata sets.

Notifications You must be signed in to change notification settings

ACCakut/DetectionGV

 
 

Repository files navigation

Introduction

In this repository, all scripts needed to analyse whole genome sequencing data of transformation hybrids are collected. First, raw sequencing reads are processed in the "Raw reads analysis pipeline" and then different genomic variations are detected in "Further analysis".
The pipeline is visualized in the following scheme:

Overview of folders and scripts

  • 0_WGSPipeline

    • This contains all bioinformatic scripts needed to analyse whole genome, raw sequencing reads of bacterial transformation hybrids.
  • 1_Detection

    • With the outputs from 0_WGSPipeline, different genomic variations in the hybrid's genomes can be detected. This includes:
    • For all genomic variations, the affected genes can be detected with A2_Lists2Genes.m
  • 2_GeneralScripts

    • Here, general scripts are collected that are needed to sort and convert annotations as well as find gene orthologues with blast.
  • dictionaries_Bacillus

    • All extra files that are needed across the different project folders for Bacillus hybrids are collected here. This includes:
      • masterlists for different donors (in ml. These lists are created with A0b_MasterListFiltering.m).
      • accessory genome lists for different donors (in acc. These lists are created with A0c_AccessoryGenome.m).
      • Other files:
        • reference files (.fasta ), usually downloaded, e.g. from NCBI
        • annotation files: initially downloaded as .gff3 files and then converted to bed.mat / .bed.txt with the script 2_GeneralScripts/Convert_gff3_to_bed.m
        • recipient specific list of multimapping regions (Bs166NCe_mm.txt created with A0d_Multimapper.m)
        • SNP artefacts (Bs166NCe_mm.txt), coverage artefacts for deletions/ duplications Bs166NCe_ArteCov.txt and insertions have to be excluded in the following analysis. These lists are created by running the according scripts that detect these genomic variations with mapping data between the recipient and its own reference.
  • ToyData

    • In this folder, we provide a toy data set of a hybrid sample (recipient: Bacillus subtilis, donor: Bacillus spizizenii). The files are the outputs from the analysis of raw reads from folder 0_WGSPipeline and can be used to test the scripts in the detection scripts in 1_Detection.
    • If you want raw reads to test the scripts in 0_WGSPipeline, then check out our publication at DOI: 10.1038/s41396-023-01440-x. Here, the sequencing data is made available.

How to cite our work:

  • The bioinformatic scripts and basic pipeline for the detection of genomic variations were first developed for the work published here:

Jeffrey J. Power, Fernanda Pinheiro, Simone Pompei, Viera Kovacova, Melih Yüksel, Isabel Rathmann, Mona Förster, Michael Lässig, and Berenike Maier. Adaptive evolution of hybrid bacteria by horizontal gene transfer. Proceedings of the National Academy of Sciences, 2021, DOI:10.1073/pnas.2007873118

  • In the form presented here, the scripts were published in:

Isabel Rathmann, Mona Förster, Melih Yüksel, Lucas Horst, Gabriela Petrungaro, Tobias Bollenbach, and Berenike Maier. Distribution of fitness effects of cross-species transformation reveals potential for fast adaptive evolution. The ISME Journal, 2022, DOI: 10.1038/s41396-023-01440-x

  • and extensively used and improved for:

Mona Förster, Isabel Rathmann, Melih Yüksel, Jeffrey J. Power, and Berenike Maier. Genome-wide transformation reveals extensive exchange across closely related Bacillus species. BioXriv, 2023, DOI: 10.1101/2023.07.03.547483.

The original version of all scripts, as cited in the PhD thesis of Isabel Rathmann, is preserved on branch PhDThesis_Rathmann2023.

About

In this repository, scripts are collected that are used to detect different genetic variations (GV) in DNA-seq ata sets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • MATLAB 92.9%
  • Shell 7.1%