-
Notifications
You must be signed in to change notification settings - Fork 2
Support for strutural variants in vcf_prepper #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nakib103
wants to merge
23
commits into
Ensembl:main
Choose a base branch
from
nakib103:sv_support
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for structural variant
ENSVAR-6684
Running in structural variant mode
To run this script in structural variant mode we have added a 4th parameter. If set, the bin will run in structural variant mode and base the logic on that.
An pipeline parameter has been added -
params.structural_variant
to turn it on at the pipeline level.When running in structural_variant mode it affects the
generate_vep_config
andvcf_to_bed
process -vcf_to_bed
variant group
The SVs are grouped into 5 types depending on their variant class. The details are in the doc -
https://docs.google.com/spreadsheets/d/1TfvsMBFJFfHZrRIrVkFVfQbAbfRm7LQmTujQPY6BPJw/edit?gid=0#gid=0
A new hashmap has been added and used in structural variant mode.
This PR addresses 2 more issue related to SV in vcf_to_bed
sequence_alteration
as variant class for SV we should not try to calculate variant class (as the calculation is based on short variant) but return the class as is.end
fromINFO/SVLEN
and if not available orSVLEN=0
thenINFO/END
. For, insertion and breakend types the position has will be updated to paint them as single point variant like SNV insertions. (In future, we might need to put the SVLEN information somewhere for them in case EV design needs them).generate_vep_config
When creating vep_config INI file for VEP in structural_variant mode so -
Handling multiple source files
Non-SV related update
get_db_name
function can get multiple databases now with the introduction of the integrated and partial databases. For example it will match both the following database -For now, just return the first one and post a warning message.
the generate_synonym_file module now add
chr
prefixed name in the synonym file by default (e.g. -1
-->chr1
). Before we had to rely on core database to have those synonyms. But I noticed those missing in some core database.The output file name from VEP has the format __VEP.vcf.gz. But if either genome or source have
.
in its name then it would get truncated. Replacing the.
with_
while providing nextflow-vep with a output file name.Test
Example SV vcf file can be obtained from dbVar -
https://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/by_study/vcf/