Assess draft genome completeness using a fast, alignment-free, k-mer hash-based approach (aaKomp). This tool uses amino acid k-mers and a multi-index Bloom filter (miBf) to estimate the completeness of genome assemblies.
Concept: Johnathan Wong and Rene L. Warren
Design and Implementation: Johnathan Wong
Under construction
git clone https://github.com/bcgsc/aakomp.git
cd aakomp
meson --prefix /path/to/install build
cd build
ninja install
- GCC 7+ with OpenMP
- Python 3.9+
- zlib
- meson
- ninja
- tcmalloc
- sdsl-lite
- libdivsufsort
- btllib
- libsequence
- gperftools
- numpy
- matplotlib
We recommend creating a fresh conda environment:
conda create --name aakomp
conda activate aakomp
conda install -c conda-forge -c bioconda --file requirements.txt
You can run aaKomp
either directly or using the wrapper script run-aakomp
.
The run-aakomp
wrapper automates:
- Checking for existing miBF
- Building a miBF if missing using
make_mibf
- Running
aakomp
- Running post-analysis with
aakomp_score.py
This demo runs aaKomp
on the C. elegans genome using the nematoda_odb12
ortholog protein set.
# Download the C. elegans genome
wget -nc https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/985/GCF_000002985.6_WBcel235/GCF_000002985.6_WBcel235_genomic.fna.gz
# Decompress the genome
gunzip -c GCF_000002985.6_WBcel235_genomic.fna.gz > GCF_000002985.6_WBcel235_genomic.fna
# Link the protein reference file to the current directory (assumes you have nematoda_odb12 in the current directory)
ln -sf nematoda_odb12/nematoda_odb12.faa ./
# Run aaKomp through the wrapper
run-aakomp --db-dir ./ \
--reference nematoda_odb12.faa \
--input GCF_000002985.6_WBcel235_genomic.fna \
-o demo
run-aakomp
options (partial list):
Option | Description |
---|---|
--input -i |
Genome file (FASTA) to assess |
--reference -r |
Protein database (FASTA, amino acid) |
--output -o |
Output prefix |
--db-dir |
Directory to store miBF database |
--threads -t |
Number of threads (default: 48) |
--hash -H |
Number of hash functions for miBF (default: 9) |
--kmer -k |
Amino acid k-mer size (default: 9) |
--lower_bound -l |
Minimum occupancy threshold (default: 0.7) |
--rescue_kmer |
Number of consecutive k-mers to initiate a seed |
--max_offset |
Max distance to extend seed during chaining |
--track-time |
Track runtime of each major step |
--dry-run |
Print commands only, do not execute |
--verbose -v |
Verbose output |
--debug |
Debug mode for internal troubleshooting |
aaKomp Copyright (c) 2025
British Columbia Cancer Agency Branch. All rights reserved.
Licensed under the GNU General Public License v3. See LICENSE
or http://www.gnu.org/licenses/.
For commercial licensing inquiries, contact:
Patrick Rebstein – [email protected]