I would like some confirmation on the definition of SVLEN when used with different representations of specific variants. I used Tandem Repeats as an example but that could be applicable to others.
##fileformat=VCFv4.4
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the longest variant described in this record">
##INFO=<ID=SVLEN,Number=A,Type=Integer,Description="Length of structural variant">
##INFO=<ID=CN,Number=A,Type=Float,Description="Copy number of allele">
##INFO=<ID=RN,Number=A,Type=Integer,Description="Total number of repeat sequences in this allele">
##INFO=<ID=RUS,Number=.,Type=String,Description="Repeat unit sequence of the corresponding repeat sequence">
##INFO=<ID=RUL,Number=.,Type=Integer,Description="Repeat unit length of the corresponding repeat sequence">
##INFO=<ID=RUC,Number=.,Type=Float,Description="Repeat unit count of corresponding repeat sequence">
##INFO=<ID=RB,Number=.,Type=Integer,Description="Total number of bases in the corresponding repeat sequence">
##INFO=<ID=RUB,Number=.,Type=Integer,Description="Number of bases in each individual repeat unit">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phase set">
##ALT=<ID=CNV:TR,Description="Tandem repeat determined based on DNA abundance">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr1 130 . G GCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG . . GT:PS 1|0:100
chr1 130 . G <CNV:TR> END=130;SVLEN=1;CN=20;RUS=CAG;RN=1;RB=60 . GT:PS 1|0:100
chr1 130 . G <INS> END=130;SVLEN=60; . GT:PS 1|0:100
I would like some confirmation on the definition of SVLEN when used with different representations of specific variants. I used Tandem Repeats as an example but that could be applicable to others.
SVLENshould be empty. (Section 3 - SVLEN:The missing value . should be used for all other ALT alleles, including ALT alleles using breakend notation)<CNV:TR>in which case it is an SV andSVLENrepresent the length of the reference allele or 1 if novel. (Section 5.7:The SVLEN of the <CNV:TR> is the length of the reference allele. It is not the length of the <CNV:TR> allele)<INS>or<DEL>in which case SVLEN is the length of the actual inserted or deleted bases. (Section 3 - SVLEN:SVLEN is defined for INS, DUP, INV , and DEL symbolic alleles as the number of the inserted, duplicated, inverted, and deleted bases respectively.)Example bellow
Did I interpret the specs correctly ?
I see two definitions of
SVLEN: Length of the Structural Variant or length of the reference allele and I find this confusing. (I'm expecting I won't be the only one). Was this intended ?Is it necessary to enforce absence of SVLEN value for non symbolic allele ?