Skip to content

JGI-Bioinformatics/racon

 
 

Repository files navigation

Racon

Consensus module for raw de novo DNA assembly of long uncorrected reads.

Description

Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step, such as Miniasm.
The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods.
Racon takes as input only three files: the raw contig FASTA, the original raw reads in FASTQ format and an overlap file in PAF or MHAP format containing overlaps between the reads and the contigs.
Overlaps can be generated quickly using Minimap and piped through to Racon (see example usage below).
Racon then reads the overlaps, filters them, and aligns the region defined by each overlap.

Alternatively, instead of overlaps, alignments (SAM format) of raw reads mapped to the contigs can be provided. We recommend GraphMap for generating the alignments. In this case, the --sam parameter needs to be specified in the command line.

Please note: Racon depends on quality values - the input reads need to be in the FASTQ format, or alternatively, the SAM file needs to have them included.

There is also a default QV threshold set to 10 (Phred score). If your data is of poorer quality or QVs are differently calibrated, you can modify this parameter using --bq FLOAT. If FLOAT is -1, the QV filtering will be turned off.
The QV filtering is applied on each window separately. For a particular window (500bp in length on the backbone by default) parts of all reads which overlap with this window are extracted together with their quality values. If the overlapping part of a read is of quality lower than the given threshold, the read will be excluded from consensus on that window. This is performed separately for each window, meaning that if parts of a read are of worse quality than others, only those parts will not be taken into account (instead of e.g. filtering the entire read from the dataset).

Quick start

Clone and make Racon:

git clone https://github.com/isovic/racon.git  && cd racon && make modules && make tools && make -j  

Run an example script:

./example1-paf-lambda.sh   

Tip: Running Racon iteratively will produce better consensus sequences. (But don't forget to re-run the overlap/alignment of your reads to the consensus sequence from the previous iteration.)

Updates

  • Significant reduction in memory consumption: 3x less memory required for C. Elegans compared to our preprint

Dependencies

  1. gcc >= 4.8
  2. Zlib - sudo apt-get install zlib1g-dev

Optional:
MUMmer needs to be installed to successfully execute the example scripts (for evaluation purposes).
On Ubuntu-based systems:

sudo apt-get install mummer  

Numpy and Matplotlib are also required for evaluation purposes:

sudo apt-get install python-numpy  
sudo apt-get install python-matplotlib  

Installation

git clone https://github.com/isovic/racon.git && cd racon && make modules && make tools && make -j  

Usage

bin/racon [options] <reads.fastq> <mappings.paf/mhap/sam> <raw_contigs.fasta/fastq/gfa> <out_consensus.fasta>  

Racon depends on quality values - the reads file/SAM file needs to have them included.
For detailed info on various options, run bin/racon without arguments.

Racon can also be run in pipe with other tools: overlaps can be read directly from stdin if the parameter <mappings.paf> is equal to -.

Further, the backbone sequences can also be provided as a GFA file, which is output directly by the Miniasm layout tool.

Concrete examples can be found below.

Consensus from mappings/overlaps in PAF/MHAP format

Generate mappings of reads to the layout using, e.g., Minimap:

tools/minimap/minimap test-data/lambda/layout-miniasm.gfa.fasta test-data/lambda/reads.fastq > test-data/lambda/mappings.paf  

Important note: mappings.paf are not the same as the overlaps used to generate the original assembly. Once the assembly is generated (e.g. Minimap for overlapping and Miniasm for layout), reads need to be mapped to the layout (e.g. using Minimap in the mapping mode (default usage with no additional parameters) as in the example above).

Run Racon on the mappings:

bin/racon test-data/lambda/reads.fastq test-data/lambda/mappings.paf test-data/lambda/layout-miniasm.gfa.fasta test-data/lambda/consensus.fasta  

Conversely, the same can be done using a one-liner:

tools/minimap/minimap test-data/lambda/layout-miniasm.gfa.fasta test-data/lambda/reads.fastq | bin/racon test-data/lambda/reads.fastq - test-data/lambda/layout-miniasm.gfa.fasta test-data/lambda/consensus.fasta  

Mappings can also be provided in the MHAP format by specifying the --mhap option, e.g.:

bin/racon --mhap test-data/lambda/reads.fastq test-data/lambda/mappings.mhap test-data/lambda/layout-miniasm.gfa.fasta test-data/lambda/consensus.fasta  

Consensus from SAM alignments

Usage is the same as when loading PAF files, with only two changes:

  1. Instead of path to a PAF file containing overlaps, a path to a SAM file containing alignments is provided. Alignments can be generated by any mapper/aligner.
  2. Provide an additional --sam option.

Example (provided you generated the test-data/lambda/alignments.sam):

bin/racon --sam test-data/lambda/reads.fastq test-data/lambda/alignments.sam test-data/lambda/layout-miniasm.gfa.fasta test-data/lambda/consensus.fasta  

Error-correction

Racon can be used as a read error-correction tool as well. In this scenario, the PAF/MHAP file needs to contain actual overlaps instead of mappings. These can be obtained using e.g.:

tools/minimap/minimap -Sw5 -L100 -m0 -t8 test-data/lambda/reads.fastq test-data/lambda/reads.fastq > test-data/lambda/overlaps.paf  

Then, Racon can be run as:

bin/racon --erc test-data/lambda/reads.fastq test-data/lambda/overlaps.paf test-data/lambda/reads.fastq test-data/lambda/erc-reads.fasta  

Contact information

For additional information, help and bug reports please send an email to one of the following: [email protected], [email protected], [email protected], [email protected]

Acknowledgment

This work has been supported in part by Croatian Science Foundation under the project UIP-11-2013-7353.
IS is supported in part by the Croatian Academy of Sciences and Arts under the project "Methods for alignment and assembly of DNA sequences using nanopore sequencing data".
NN is supported by funding from A*STAR, Singapore.

Packages

No packages published

Languages

  • C++ 61.9%
  • Python 17.7%
  • Shell 16.4%
  • Makefile 3.3%
  • Perl 0.7%