Name	Name	Last commit message	Last commit date
parent directory ..
Dockerfile	Dockerfile
README.md	README.md

BBTools container

Main tool: BBTools

Code repository: https://sourceforge.net/projects/bbmap/ and https://github.com/bbushnell/BBTools

Additional tools:

samtools: 1.23.1
htslib: 1.23.1
sambamba: 1.0.1

Basic information on how to use this tool:

executable: *.sh
help: Program descriptions and options are shown when running the shell scripts with no parameters.
version: --version
description:

BBTools is a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.

Additional information:

+-------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------------------+
| Script                  | Purpose                                                                          | Comment                                                                |
+-------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------------------+
| bbcms.sh                | Performs error correction using a Count-Min Sketch                               | Intended for metagenome assembly assembly                              |
| bbcountunique.sh        | Counts unique kmers in reads                                                     |                                                                        |
| bbduk.sh                | Trims, filters or masks reads using kmers                                        |                                                                        |
| bbmap.sh                | Splice-aware aligner for short reads                                             |                                                                        |
| bbmapskimmer.sh         | BBMap version designed for high levels of multimapping                           |                                                                        |
| bbmask.sh               | Masks references based on various things, such as sequence complexity            |                                                                        |
| bbmerge.sh              | Merges overlapping paired reads                                                  |                                                                        |
| bbmerge-auto.sh         | Same as bbmerge, but tries to allocate all memory on the node                    | Use this version for kmer operations like extend                       |
| bbnorm.sh               | Normalizes reads based on coverage                                               | Mainly for use prior to single-cell assembly                           |
| bbsplit.sh              | BBMap version that maps to multiple references simultaneously                    | Intended for decontamination; similar to Seal                          |
| bbversion.sh            | Prints the version of BBTools                                                    |                                                                        |
| bbwrap.sh               | Wraps BBMap to process many files using same reference                           | Saves time by loading the index only once                              |
| calctruequality.sh      | Allows recalibration of quality scores from mapped reads                         | This generates the correction matrix; BBDuk does the recalibration     |
| callgenes.sh            | Fast prokaryotic gene caller                                                     | Integrated into BBSketch                                               |
| callvariants.sh         | Fast variant caller                                                              |                                                                        |
| callvariants2.sh        | Same as callvariants.sh with the "multisample" flag                              |                                                                        |
| clumpify.sh             | Shrinks compressed fastq files, and can remove duplicate reads                   | Also supports error correction                                         |
| comparesketch.sh        | Compares sketches locally, without using a sketch server                         |                                                                        |
| crossblock.sh           | Alias for decontaminate.sh                                                       |                                                                        |
| cutgff.sh               | Cuts out features defined by gff file                                            | E.g, generates one fasta entry per gene from a gff and an assembly     |
| cutprimers.sh           | Cuts out subregions of ribosomes                                                 | Mainly for 16S analysis                                                |
| decontaminate.sh        | Pool-level decontamination for single-cell MDA-amplified genomes                 |                                                                        |
| dedupe.sh               | Removes duplicate and fully-contained sequences                                  | Can also be used to cluster 16S sequences                              |
| dedupe2.sh              | Version of dedupe that supports more hash keys for greater sensitivity           |                                                                        |
| dedupebymapping.sh      | Deduplicates reads based on mapping coordinates                                  |                                                                        |
| demuxbyname.sh          | Demultiplexes based on sequences headers                                         |                                                                        |
| filterbyname.sh         | Filters based on sequence headers                                                |                                                                        |
| filterbytaxa.sh         | Filters sequences based on taxonomic classification                              | Used with NCBI datasets                                                |
| filterbytile.sh         | Removes reads that are in low quality areas on flowcell                          |                                                                        |
| filterqc.sh             | Part of JGI's fastq filtering pipeline                                           |                                                                        |
| filtersam.sh            | Filters sam files to remove reads with multiple unsupported mismatches           | Designed for NovaSeq                                                   |
| gitable.sh              | Used to process NCBI taxonomy data                                               |                                                                        |
| khist.sh                | Alias for bbnorm.sh with flags for making a kmer frequency histogram             |                                                                        |
| kmercountexact.sh       | Counts kmers and produces a histogram                                            | Uses more memory than BBNorm but allows exact counts                   |
| kmercountmulti.sh       | Cardinality estimation over multiple kmer lengths                                | Uses LogLog; does not produce a histogram                              |
| mapPacBio.sh            | BBMap version designed for PacBio or Nanopore reads                              | Reads longer than 5kbp get broken into 5kbp shreds                     |
| mergesketch.sh          | Allows multiple sketches to be combined                                          |                                                                        |
| msa.sh                  | Alignment tool                                                                   | Used with cutprimers.sh to cut subsections out of 16s                  |
| mutate.sh               | Generates synthetic genomes by randomly mutating the input                       |                                                                        |
| muxbyname.sh            | Multiplex multiple files, renaming sequences based on input file name            | Opposite of demuxbyname.sh                                             |
| partition.sh            | Splits a sequence file into multiple files                                       |                                                                        |
| pileup.sh               | Calculates coverage from sam files                                               |                                                                        |
| plotflowcell.sh         | Produces statistics about flowcell positions                                     |                                                                        |
| processhi-c.sh          | Custom trimming for hi-C reads                                                   | In development                                                         |
| randomreads.sh          | Generates synthetic data from real genome reference                              | Highly customizable                                                    |
| readqc.sh               | Short read quality report                                                        | Alternative to fastqc                                                  |
| reformat.sh             | Converts sequence files to another format                                        | Has many additional options, includes subsampling                      |
| rename.sh               | Renames sequences in various ways, such as adding a prefix                       |                                                                        |
| repair.sh               | Fixes broken pairing in fastq files                                              |                                                                        |
| representative.sh       | Makes a smaller subset of a reference dataset by eliminating redundancy          | Designed for use with BBSketch output                                  |
| rqcfilter2.sh           | Filtering pipeline used at JGI                                                   | portal.nersc.gov/dna/microbial/assembly/bushnell/RQCFilterData.tar     |
| seal.sh                 | Counts kmer matches between query and reference sequences                        |                                                                        |
| sendsketch.sh           | Fast taxonomic classifier using webservers at JGI                                |                                                                        |
| shred.sh                | Breaks sequences into shorter, fixed-length pieces                               |                                                                        |
| shuffle.sh              | Randomly reorders input file                                                     | Crashes if input doesn't fit in memory                                 |
| shuffle2.sh             | Randomly reorders input file                                                     | Supports larger files, but output might be less random                 |
| sketch.sh               | Makes reference sketches on a per-TaxID basis                                    |                                                                        |
| sketchblacklist.sh      | Makes sketch blacklists of common kmers                                          |                                                                        |
| sortbyname.sh           | Sorts sequences by name, length, quality, taxa, and other things                 |                                                                        |
| summarizequast.sh       | Generates box plots for multiple quast reports                                   |                                                                        |
| tadpipe.sh              | Preprocessing and assembly pipeline using tadpole                                |                                                                        |
| tadpole.sh              | Fast short read assembler                                                        |                                                                        |
| tadwrapper.sh           | Runs Tadpole with multiple kmer lengths to select the best assembly              |                                                                        |
| taxserver.sh            | Starts taxonomy and sketch servers                                               |                                                                        |
| testformat.sh           | Determines if file is fasta, fastq, interleaved, etc. by reading first few lines |                                                                        |
| testformat2.sh          | Generates extensive statistics by reading the full file                          |                                                                        |
| translate6frames.sh     | Translates nucleotide sequence into amino acid sequence in all frames            |                                                                        |
| vcf2gff.sh              | Converts vcf format to gff format                                                |                                                                        |
+-------------------------+----------------------------------------------------------------------------------+------------------------------------------------------------------------+

Full documentation: https://bbmap.org/docs

Example Usage

(adapted from /opt/bbmap/pipelines/covid/processCorona.sh)

Interleave a pair of FASTQ files for downstream processing:

reformat.sh \
    in1=${SAMPLE}_R1.fastq.gz \
    in2=${SAMPLE}_R2.fastq.gz \
    out=${SAMPLE}.fastq.gz

Split into SARS-CoV-2 and non-SARS-CoV-2 reads:

bbduk.sh ow -Xmx1g \
    in=${SAMPLE}.fq.gz \
    ref=REFERENCE.fasta \
    outm=${SAMPLE}_viral.fq.gz \
    outu=${SAMPLE}_nonviral.fq.gz \
    k=25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

BBTools container

Example Usage

FilesExpand file tree

39.81

Directory actions

More options

Directory actions

More options

Latest commit

History

39.81

Folders and files

parent directory

README.md

BBTools container

Example Usage