Skip to content

Comparison between different splice prediction software

License

Notifications You must be signed in to change notification settings

YStrauch/CI-SpliceAI__Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Links

This repository is part of the CI-SpliceAI software package published in PLOS One.

This is the project comparing different splice prediction tools on variant data. You may also be interested in the code to train CI-SpliceAI, code to use trained models to annotate variants offline, and the website providing online annotation of variants.

Abstract

In this project, we are evaluating 6 different splice prediction tools (one of which is ours called CI-SpliceAI) on a corpus of:

  • 1,317 variants for a binary affecting/non-affecting task; and
  • 388 variants (subset of the first corpus) with annotations of their exact variant effect

This repository contains all variants and all code to re-produce the results obtained.

Variant Data

Visualisations of the variants:

Pie diagrams of the data Distance from a variant to its closest splice site

Results

Optimal Thresholds, PR-AUC, PR-ROC, and optimal Accuracy

Algorithm Coverage AUC-PR AUC-ROC Optimal Threshold Accuracy
MES (Sliding) 100% 55.68% 52.97% 12.5 53.42%
SQUIRLS 100% 91.32% 91.17% 0.074 85.64%
MES (VEP) 58% 92.52% 89.15% 2.109 86.40%
MMSplice (Splicing Efficiency) 99% 93.03% 92.56% 1.119 87.23%
MMSplice (Pathogenicity) 99% 94.13% 92.84% 0.961 88.53%
SpliceAI 99% 96.21% 95.65% 0.3 90.88%
CI-SpliceAI 100% 97.25% 96.75% 0.19 92.17%

PR-Curves of all algorithms; CI-SpliceAI is superior to the rest

Predictive error between CI-SpliceAI and SpliceAI

Predictive error bettered in the majority of data points

Exact variant effect prediction accuracy

Algorithm Acceptor Gain Acceptor Loss Donor Gain Donor Loss
MES (Sliding) 0.00% 1.16% 2.33% 2.25%
SpliceAI 87.50% 77.10% 79.07% 78.93%
CI-SpliceAI 93.75% 78.55% 79.07% 82.02%

CI-SpliceAI Mispredictions

Predictive error bettered in the majority of data points

Methods

These steps were taken:

CSV to VCF

The variant csv file was parsed into vcf format and normalised (index, normalise rows, align left).

The resulting vcf file is checked in this repository, so you don't need to run the code producing it.

Running tools

We ran all tools on the vcf file using predict.sh.

Results are checked into predictions/.

Analysis

Variant data and predictions were analysed and plotted using analysis.sh into analysis/.

Setup

This project is built on bash scripts. We suggest running it on a UNIX system; it might be possible to run it on windows using a bash environment like git bash, this is however untested and unsupported.

Before running the setup code, make sure you agree to all licences of third-party components.

Please make sure to install these manual dependencies first:

Then run setup.sh which will automatically:

  • Create conda environments with SpliceAI, CI-SpliceAI and MMSplice (through kipoi)
  • Download all third party elements like:
    • SQUIRLS command line, jannovar annotations, database
    • the human reference genome
    • GENCODE annotations for MMSplice
  • Pre-process GENCODE annotations for MMSplice

Licensing

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

By running this code, you are installing third-party software. It is your responsibility to assure that you are following all third party licenses.

About

Comparison between different splice prediction software

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published