CONGA

(COpy Number variation Genotyping in Ancient genomes and low-coverage sequencing data)

CONGA is a genotyping algorithm for Copy Number Variations (large deletions and duplications) in ancient genomes. It is tailored for calling homozygous and heterozygous CNV genotypes at low depths of coverage using read-depth and read-pair information from a BAM file with Illumina short single-end reads.

Please feel free to send me an e-mail (asoylev@gmail.com), or better yet open an issue for your questions.

Requirements

CONGA is developed and tested using Linux Ubuntu operating system

htslib (included as submodule; http://htslib.org/)
- libbz2, liblzma, libcurl are required by htslib
sonic (included as submodule; https://github.com/calkan/sonic)

Installing development libraries (requires sudo access): "sudo apt-get install zlib1g-dev libbz2-dev liblzma-dev libcurl4-openssl-dev"

Downloading, compiling and running

git clone https://github.com/asylvz/CONGA --recursive
cd CONGA && make libs && make

./conga -i myinput.bam --ref human_g1k_v37.fasta --sonic human_g1k_v37.sonic  \
	--dels known_dels.bed --dups known_dups.bed --out myoutput

If you use a X86_64 Linux machine, you can directly use our binary file (under "Relases") after "sudo chmod 755 conga_v1.0_X86_64"

Compiling and running without sudo access

If you do not have root access to install liblzma and/or libbz2, you can compile htslib without CRAM support. However, the libz library is still required, please talk to your admin if it is not available on your system.

make nocram

./conga-nocram -i myinput.bam --ref human_g1k_v37.fasta --sonic human_g1k_v37.sonic  \
	--dels known_dels.bed --dups known_dups.bed --out myoutput

Docker Usage

Another alternative to run CONGA is using Docker

cd docker
docker build . -t conga:latest

Your image named "conga" should be ready. You can run CONGA using this image by

docker run --user=$UID -v /home/projects/conga:/input -v /home/projects/conga:/output conga -i /input/myinput.bam --sonic /input/human_g1k_v37.sonic --ref /input/human_g1k_v37.fasta --dels /input/known_dels.bed --dups /input/known_dups.bed --out /output/mydockertest

Alternatively, you can pull from Docker Hub:

docker pull asylvz/conga

SONIC file (required)

You need to input a SONIC file as input to CONGA (--sonic). This file contains some annotation based on the reference genome that you use. You can use one of the already created ones from: https://github.com/BilkentCompGen/sonic-prebuilt

human_g1k_v37.sonic: SONIC file for Human Reference Genome GRCh37 (1000 Genomes Project version)
ucsc_hg19.sonic: SONIC file for the human reference genome, UCSC version build hg19.
ucsc_hg38.sonic: SONIC file for the human reference genome build 38.

If you are working with a different reference genome, you need to create the SONIC file yourself. This is a straightforward process; please refer to the SONIC development repository: https://github.com/calkan/sonic/

Sample Genotype file (required)

You can use the "svcalls.sh" script under /scripts to generate CNV calls from the 1K Phase 3 SV call set

1	668630		850204
1	963826		974172
1	1171539		1179729
1	1249799		1265722
1	2374226		2379823
...

The columns are "Chromosome Name" (TAB) "Start Position of a CNV" (TAB) "End Postion of a CNV"
This file should be seperate for duplications and deletions if both are to be genotyped.

Sample Mappability File (optional)

1	63913643	63913648	0.2
1	63913648	63913649	0.25
1	63913649	63913653	0.5
1	63913653	63913659	0.333333
...

Using a mappability file (--mappability) increases the accuracy of CONGA's predictions. We used the 100-mer mappability file from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeMapability/ and converted the bigWig file into a BED file using "bigWigToBedGraph".

The columns are "Chromosome Name" (TAB) "Start Position of a CNV" (TAB) "End Postion of a CNV" (TAB) "Mappability value"
- Note that the mappability value should be between [0,1], where lower values indicate lower mappability intervals, i.e., repeat-rich regions, etc.

All parameters

--input 		[BAM file]         : Input files in sorted and indexed BAM format. (required)
--out   		[output prefix]    : Prefix for the output file names. (required)
--ref   		[reference genome] : Reference genome in FASTA format. (required)
--sonic 		[sonic file]       : SONIC file that contains assembly annotations. (required)
--dels          	[bed file]         : Known deletion SVs in bed format
--dups          	[bed file]         : Known duplication SVs in bed format
--mappability   	[bed file]         : Mappability file in BED format
--first-chr     	[chromosome index] : The index of the first chromosome for genotyping in your BAM.
--last-chr      	[chromosome index] : The index of the last chromosome for genotyping in your BAM.
--min-read-length	[integer]	   : Minimum length of a read to be processed for RP (default: 60 bps)
--min-sv-size		[integer]	   : Minimum length of a CNV (default: 1000 bps)
--min-mapq		[integer]	   : Minimum mapping quality threshold for reads (default: -1)
--c-score               [float]            : Minimum c-score to filter variants (More conservative with lower values, default: 0.5).
--rp                    [integer]          : Enable split-read and set minimum read-pair support for a duplication (Suggested for >5x only).

Information:
--version                  		   : Print version and exit.
--help 		                           : Print this help screen and exit.

Citation

Arda Söylev, Sevim Seda Çokoglu, Dilek Koptekin, Can Alkan, and Mehmet Somel. "CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data." PLOS Computational Biology 18, no. 12 (2022): e1010788. https://doi.org/10.1371/journal.pcbi.1010788

Name	Name	Last commit message	Last commit date
Latest commit asylvz Update readme.txt Apr 14, 2023 58d4fe4 · Apr 14, 2023 History 87 Commits
docker	docker	Added docker build option	Nov 24, 2022
htslib @ dd6f0b7	htslib @ dd6f0b7	Adding submodules again	Oct 11, 2019
scripts	scripts	Update readme.txt	Apr 14, 2023
sonic @ 03f8a89	sonic @ 03f8a89	Adding submodules again	Oct 11, 2019
.gitmodules	.gitmodules	Adding submodules again	Oct 11, 2019
LICENSE	LICENSE	Create LICENSE	Dec 21, 2022
Makefile	Makefile	Update Makefile	Nov 24, 2022
README.md	README.md	Update README.md	Mar 8, 2023
bam_data.c	bam_data.c	Added --min-mapq and rp is invoked by --rp n	Mar 9, 2022
bam_data.h	bam_data.h	Added kmer calculation and exclude regions	Apr 27, 2020
cmdline.c	cmdline.c	Minor fix	Mar 17, 2022
cmdline.h	cmdline.h	Fixed help menu and version printing issue	Nov 27, 2021
common.c	common.c	Added c-score filter parameter	Mar 17, 2022
common.h	common.h	Minor fix	Mar 17, 2022
free.c	free.c	Fixed memory issue	Jun 1, 2020
free.h	free.h	Added paired-end support with split reads	Dec 9, 2019
likelihood.c	likelihood.c	Added N/A for variants in a specific range	Jul 30, 2022
likelihood.h	likelihood.h	Various improvements	May 28, 2020
read_distribution.c	read_distribution.c	Updated output files' content	Feb 15, 2022
read_distribution.h	read_distribution.h	Fixed window size	Feb 20, 2022
split_read.c	split_read.c	Removed unnecessary codes	Nov 24, 2021
split_read.h	split_read.h	Removed unnecessary codes	Nov 24, 2021
svdepth.c	svdepth.c	Added c-score filter parameter	Mar 17, 2022
svdepth.h	svdepth.h	Added paired-end support with split reads	Dec 9, 2019
svs.c	svs.c	Updated output files' content	Feb 15, 2022
svs.h	svs.h	Updated output files' content	Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CONGA

Requirements

Downloading, compiling and running

Compiling and running without sudo access

Docker Usage

SONIC file (required)

Sample Genotype file (required)

You can use the "svcalls.sh" script under /scripts to generate CNV calls from the 1K Phase 3 SV call set

Sample Mappability File (optional)

All parameters

Citation

About

Releases 2

Packages

Languages

License

asylvz/CONGA

Folders and files

Latest commit

History

Repository files navigation

CONGA

Requirements

Downloading, compiling and running

Compiling and running without sudo access

Docker Usage

SONIC file (required)

Sample Genotype file (required)

You can use the "svcalls.sh" script under /scripts to generate CNV calls from the 1K Phase 3 SV call set

Sample Mappability File (optional)

All parameters

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages