Skip to content

bihealth/svirlpool

Repository files navigation

Svirlpool: structural variant detection from long read sequencing by local assembly

Our preprint is now on BioRXiv!

Svirlpool illustration


Overview

Motivation

Long-Read Sequencing (LRS) promises great improvements in the detection of structural genome variants (SVs). However, existing methods are lacking in key areas such as the reliable detection of inserted sequence, precise genotyping of variants, and reproducible calling of variants across multiple samples. Svirlpool targets Oxford Nanopore Technologies (ONT) sequencing data using local assembly of candidate SV regions to obtain high-quality consensus sequences.

Results

Svirlpool obtains competitive results to leading methods like Sniffles on Genome in a Bottle (GiaB) benchmarks. On trio data, Svirlpool shows favorable performance in Mendelian consistency, indicating great promise for clinical applications.


Table of Contents

  1. Installation
  2. Quick Start (Example Data)
  3. Workflow for Real Data
  4. Developer & Formatting Hints

Installation

You can run Svirlpool either via pre-built containers or by installing from the source code. Both methods are fully supported.

Option A: Docker or Singularity

  1. Install Docker: Follow the official instructions.

  2. Pull Image:

docker pull ghcr.io/bihealth/svirlpool:main
  1. Singularity (Alternative): If on an HPC, convert the image:
singularity build svirlpool.sif docker://ghcr.io/bihealth/svirlpool:main

Option B: From Source (Pixi)

  1. Install Prerequisites:
# Ubuntu 24.04+
sudo apt install -y git git-lfs
curl -fsSL https://pixi.sh/install.sh | bash
  1. Clone and Setup:
git clone git@github.com:bihealth/svirlpool.git
cd svirlpool
# Pixi will manage the environment automatically on the first run

Quick Start (Example Data)

This example uses a small MUC1 test dataset to demonstrate the two-step calling process. We demonstrate how to run with Docker. You can replace the docker run ... svirlpool part with pixi run svirlpool to run it using pixi rather than Docker.

1. Generate Svirltile

# Create working directory
mkdir -p /tmp/workdir/result

# Run using Docker (or replace with 'pixi run svirlpool ...')
docker run --rm -v $(realpath .):/data -v /tmp/workdir/result:/tmp/workdir/result -w /data \
    ghcr.io/bihealth/svirlpool:main \
    svirlpool run \
        --threads 1 --samplename muc1test --workdir /tmp/workdir/result \
        --output /tmp/workdir/result/svirltile.db \
        --alignments examples/muc1/data/muc1.bam \
        --reference examples/muc1/data/muc1.fa \
        --trf examples/muc1/data/muc1.trf.bed \
        --mononucleotides examples/muc1/data/muc1.mononucleotides.lt6.bed \
        --lamassemble-mat data/lamassemble-mats/promethion.mat

2. Generate VCF

docker run \
    --rm -v $(realpath .):/data -v /tmp/workdir/result:/tmp/workdir/result -w /data \
    ghcr.io/bihealth/svirlpool:main \
    svirlpool sv-calling \
        --threads 1 --reference examples/muc1/data/muc1.fa \
        --input /tmp/workdir/result/svirltile.db \
        --output /tmp/workdir/result/variants.vcf.gz

Workflow for Real Data

To call SVs on your own data, follow these three stages: Preparing prefab data, generating tiles, and calling variants.

I. Required Input Files

File Type Description Requirements
Alignments (.bam) Indexed long-read alignments Generated with minimap2; must have DNA sequences and quality scores.
Reference (.fa) Reference genome Indexed with samtools faidx.
Matrices (.mat) Error models for assembly Included in the repository under data/lamassemble-mats/.
Annotations TRF and Mononucleotides Download from svirlpool-data.

II. Step-by-Step Execution

Below, you will need to add the docker run ... or pixi run before the commands as explained above.

1. Setup Environment

export REFERENCE=hs37d5.fa
export THREADS=16
export TRF=$DATADIR/pbsv-annotations/human_hs37d5.trf.bed
export MAT=$SVIRLPOOLDIR/data/lamassemble-mats/promethion.mat
export MNNTS=$DATADIR/HG19/hs37d5.mononucleotides.lt6.bed.gz

2. Generate Svirltiles (Per Sample)

Run for each sample in your study (e.g., HG002 and HG003):

svirlpool run \
    --samplename HG002 \
    --workdir HG002 \
    --alignments HG002.bam \
    --reference $REFERENCE \
    --trf $TRF \
    --lamassemble-mat $MAT \
    --mononucleotides $MNNTS \
    --threads $THREADS \
    --min-sv-size 30

3. Joint SV Calling

Combine multiple tiles into a single VCF:

svirlpool sv-calling \
    --input HG002/svirltile.db HG003/svirltile.db \
    --reference $REFERENCE \
    --output family.vcf.gz \
    --sv-types DEL INS \
    --min-sv-size 50

Developer & Formatting Hints

For contributors or users running from source via pixi:

  • IDE: Open VS Code with the pre-configured environment: pixi run -e dev code .
  • Format Code: make fix
  • Lint & Tests: make check or make test
  • Full Suite: make fix check test

About

Svirlpool: structural variant detection from long read sequencing by local assembly

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages