Skip to content

IBEXCluster/Genome-Index-splitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Genome-Index-splitter (GIS)

The algorithm "Genome-Index-splitter (GIS)" will split the chromosomes into multiple intervals (also called as “Chunks”) without any overlap.
This is an independent scirpt. A “Chromosome Split Table” will be generated for parallel data distribution across the nodes.
Parallell data distribution based on “Chromosome Split Table” will be used to run CombineGVCF/CenomicDB and GenotypeGVCF.

Different versions

To improve the execution time of SNPs/INDELs calling, the reference genome is split into multiple ways as follows:

Version #1: Chromosome based distribution which was part of the workflow and uses conditional operator (not an independent script).
Version #2: Chunks will be distributed via job arrays and executed independently but batch by batch).
Version #3: Chunks will be distributed and executed as a single job via MPI, and executed concurrently across the nodes (instead of batch by batch).

Source code availability

Version #1. https://github.com/IBEXCluster/IBEX-SNPcaller/blob/master/downstream_analysis.sh
Version #2. Wheat_genome_index_spliter.sh
Version #3. MPI_genome_index_splitter.sh