Skip to content

B-UMMI/chewBBACA

Repository files navigation

PyPI Bioconda Conda chewBBACA Documentation Status License: GPL v3 DOI:10.1099/mgen.0.000166

chewBBACA

chewBBACA is a software suite for the creation and evaluation of core genome and whole genome MultiLocus Sequence Typing (cg/wgMLST) schemas and results. The "BBACA" stands for "BSR-Based Allele Calling Algorithm". BSR stands for BLAST Score Ratio as proposed by Rasko DA et al.. The "chew" part adds extra coolness to the name and could be thought of as "Comprehensive and Highly Efficient Workflow". chewBBACA allows to define the target loci in a schema based on multiple genomes (e.g. define target loci based on the distinct loci identified in a dataset of high-quality genomes for a species or lineage of interest) and performs allele calling to determine the allelic profiles of bacterial strains, easily scaling to thousands of genomes with modest computational resources. chewBBACA includes functionalities to annotate the schema loci, compute the set of loci that constitute the core genome for a given dataset, and generate interactive reports for schema and allele calling results evaluation to enable an intuitive analysis of the results in surveillance and outbreak detection settings or population studies. Pre-defined cg/wgMLST schemas can be downloaded from Chewie-NS or adapted from other cg/wgMLST platforms.

Check the documentation for implementation details and guidance on using chewBBACA.

News

3.5.0 - 2025-12-05

Added the ComputeMSA module to compute MSAs from allele calling results or from a folder containing FASTA files. The ComputeMSA module includes the following functionalities:

  • Compute loci, sample and complete MSAs based on the allelic profiles determined by chewBBACA (e.g. at the wg/cgMLST level). Gap sequences (the character used to represent gaps is -) are added whenever a locus was not identified in a sample (e.g. when working at the wgMLST level).
  • Compute a MSA for each FASTA file in a folder (just a way to run MAFFT to compute MSAs).
  • MSAs can be computed both at the protein and DNA level (i.e. by converting protein MSAs back to DNA).
  • The --output-variable option identifies the variable positions (SNVs) and creates MSAs only for those positions. When determining variable positions, positions with gaps or ambiguous bases can be excluded (--gaps exclude and --ambiguous exclude) or included (--gaps ignore and --ambiguous ignore) in the MSA if the sequences have other variable non-gap and non-ambiguous nucleotides or amino acids.
  • The SchemaEvaluator and AlleleCallEvaluator modules use the ComputeMSA module to compute the loci MSAs (SchemaEvaluator) and the complete MSA used by FastTree to compute a tree (AlleleCallEvaluator).

Check our Changelog to learn about the latest changes.

Citation

When using chewBBACA, please use the following citation:

Silva M, Machado MP, Silva DN, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço JA. 2018. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. Microb Genom 4:000166. doi:10.1099/mgen.0.000166