Skip to content
Darren J. Lin edited this page Oct 9, 2022 · 16 revisions

Welcome to the SVision wiki!

General Introduction

SVision is designed for genome structural variants detection from either reads or contigs, especially for complex structural variants detection. In particularly, SVision adopts the targeted multi-object recognition deep neural networks, detecting and characterizing structural variants from sequence similarity images. Meanwhile, SVision uses a graph data structure to depict and compare complex structural variants based on similarity images.

Scopes of SVision

SVision is designed for detecting from actual sequence, including sequenced reads from different platforms and assembled contigs.

Additional performance comparison of SVision is listed in Performance evaluation page.

Running SVision

Detection

Please first download trained model

SVision [parameters] -o <output path> -b <input bam path> -g <reference> -m <model path> -n <sample name>

SVision produces three files and GFA file for CSVs with --graph and --qname activated:

1.sampleName.graph.vcf: Detect SVs of a given sample (sampleName) in VCF format

2.sampleName.graph_exactly_match.txt: Identified isomorphic CSV graphs

3.sampleName.graph_symmetry_match.txt: Topological symmetric graphs

4.CSV.gfa: The graph representation for each CSV event under ./graph directory

Detect by region or chromosomes

SVision could detect a specific region or a single chromosome with -c option.

  • Specific region: chrom:start-end
  • Single chromosome: chr1 for GRCh38 or 1 for hg19, depending on your reference file.

Filtering

Please visit SVisionUtil for support scripts and HG00733 whole genome calls used in this study.

Optional parameters

Parameter Description Default
-t Number of threads 1
-s Minimum support read number required for SV calling 5
-c Specific region or chromosome to detect Whole genome
--hash Activate local realignment for unmapped sequence (Experimental) False
--qname Report support read names for each SV call False
--graph Report graph for each CSV call False
--contig Activate contig mode False
--min_mapq Minimum mapping quality of reads to consider 10
--min_sv_size Minimum SV size to detect 50
--max_sv_size Maximum SV size to detect 1Mbp
--window_size The sliding window size of processing BAM file 10Mbp
--partition_max_distance Maximum distance between signature partitions 5Kbp
--cluster_max_distance Clustering maximum distance for a partition 0.3
--batch_size Batch size for CNN prediction model 128
--min_gt_depth Minimum reads required for genotyping 4
--homo_thresh Minimum variant allele frequency to be called as homozygous 0.8
--hete_thresh Minimum variant allele frequency to be called as homozygous 0.2
--k_size Size of kmer used in local realignment 10
--min_accept Minimum match length for realignment 50
--max_hash_len Maximum length of unmapped sequence considered for local realignment 1Kbp

Output format

Please refer output format page for all results produced by SVision.