Skip to content

Commit 3d38893

Browse files
committed
Replaced $log_{10}$ with $\log_{10}$ for improved formatting in VCF.
1 parent 188ec0e commit 3d38893

File tree

3 files changed

+16
-16
lines changed

3 files changed

+16
-16
lines changed

VCFv4.2.tex

+4-4
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ \subsubsection{Fixed fields}
181181
\item ID - identifier: Semicolon-separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s). No identifier should be present in more than one data record. If there is no identifier available, then the missing value should be used. (String, no whitespace or semicolons permitted)
182182
\item REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; this padding base is not required (although it is permitted) for e.g.\ complex substitutions or other events where all alleles have at least one base represented in their Strings. If any of the ALT alleles is a symbolic allele (an angle-bracketed ID String ``$<$ID$>$'') then the padding base is required and POS denotes the coordinate of the base preceding the polymorphism. Tools processing VCF files are not required to preserve case in the allele Strings. (String, Required).
183183
\item ALT - alternate base(s): Comma separated list of alternate non-reference alleles. These alleles do not have to be called in any of the samples. Options are base Strings made up of the bases A,C,G,T,N,*, (case insensitive) or an angle-bracketed ID String (``$<$ID$>$'') or a breakend replacement string as described in the section on breakends. The `*' allele is reserved to indicate that the allele is missing due to a upstream deletion. If there are no alternative alleles, then the missing value should be used. Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive. (String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)
184-
\item QUAL - quality: Phred-scaled quality score for the assertion made in ALT. i.e.\ $-10log_{10}$ prob(call in ALT is wrong). If ALT is `.' (no variant) then this is $-10log_{10}$ prob(variant), and if ALT is not `.' this is $-10log_{10}$ prob(no variant). If unknown, the missing value should be specified. (Numeric)
184+
\item QUAL - quality: Phred-scaled quality score for the assertion made in ALT. i.e.\ $-10\log_{10}$ prob(call in ALT is wrong). If ALT is `.' (no variant) then this is $-10\log_{10}$ prob(variant), and if ALT is not `.' this is $-10\log_{10}$ prob(no variant). If unknown, the missing value should be specified. (Numeric)
185185
\item FILTER - filter status: PASS if this position has passed all filters, i.e., a call is made at this position. Otherwise, if the site has not passed all filters, a semicolon-separated list of codes for filters that fail. e.g.\ ``q10;s50'' might indicate that at this site the quality is below 10 and the number of samples with data is below 50\% of the total number of samples. `0' is reserved and should not be used as a filter String. If filters have not been applied, then this field should be set to the missing value. (String, no whitespace or semicolons permitted)
186186
\item INFO - additional information: (String, no whitespace, semicolons, or equals-signs permitted; commas are permitted only as delimiters for lists of values) INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: $<$key$>$=$<$data$>$[,data]. If no keys are present, the missing value must be used. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):
187187
\begin{itemize}
@@ -221,11 +221,11 @@ \subsubsection{Genotype fields}
221221
\end{itemize}
222222
\item DP : read depth at this position for this sample (Integer)
223223
\item FT : sample genotype filter indicating if this genotype was ``called'' (similar in concept to the FILTER field). Again, use PASS to indicate that all filters have been passed, a semicolon-separated list of codes for filters that fail, or `.' to indicate that filters have not been applied. These values should be described in the meta-information in the same way as FILTERs (String, no whitespace or semicolons permitted)
224-
\item GL : genotype likelihoods comprised of comma separated floating point $log_{10}$-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields. In presence of the GT field the same ploidy is expected and the canonical order is used; without GT field, diploidy is assumed. If A is the allele in REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by: F(j/k) = (k*(k+1)/2)+j. In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc. For example: GT:GL 0/1:-323.03,-99.29,-802.53 (Floats)
224+
\item GL : genotype likelihoods comprised of comma separated floating point $\log_{10}$-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields. In presence of the GT field the same ploidy is expected and the canonical order is used; without GT field, diploidy is assumed. If A is the allele in REF and B,C,... are the alleles as ordered in ALT, the ordering of genotypes for the likelihoods is given by: F(j/k) = (k*(k+1)/2)+j. In other words, for biallelic sites the ordering is: AA,AB,BB; for triallelic sites the ordering is: AA,AB,BB,AC,BC,CC, etc. For example: GT:GL 0/1:-323.03,-99.29,-802.53 (Floats)
225225
\item GLE : genotype likelihoods of heterogeneous ploidy, used in presence of uncertain copy number. For example: GLE=0:-75.22,1:-223.42,0/0:-323.03,1/0:-99.29,1/1:-802.53 (String)
226226
\item PL : the $-10 \log_{10}$ scaled genotype likelihoods rounded to the closest integer, and otherwise defined in the same way as the GL field (Integers).
227227
\item GP : the phred-scaled genotype posterior probabilities (and otherwise defined precisely as the GL field); intended to store imputed genotype probabilities (Floats)
228-
\item GQ : conditional genotype quality, encoded as a phred quality $-10log_{10}$ p(genotype call is wrong, conditioned on the site's being variant) (Integer)
228+
\item GQ : conditional genotype quality, encoded as a phred quality $-10\log_{10}$ p(genotype call is wrong, conditioned on the site's being variant) (Integer)
229229
\item HQ : haplotype qualities, two comma separated phred qualities (Integers)
230230
\item PS : phase set. A phase set is defined as a set of phased genotypes to which this genotype belongs. Phased genotypes for an individual that are on the same chromosome and have the same PS value are in the same phased set. A phase set specifies multi-marker haplotypes for the phased genotypes in the set. All phased genotypes that do not contain a PS subfield are assumed to belong to the same phased set. If the genotype in the GT field is unphased, the corresponding PS field is ignored. The recommended convention is to use the position of the first variant in the set as the PS identifier (although this is not required). (Non-negative 32-bit Integer)
231231
\item PQ : phasing quality, the phred-scaled probability that alleles are ordered incorrectly in a heterozygote (against all other members in the phase set). We note that we have not yet included the specific measure for precisely defining ``phasing quality''; our intention for now is simply to reserve the PQ tag for future use as a measure of phasing quality. (Integer)
@@ -311,7 +311,7 @@ \section{FORMAT keys used for structural variants}
311311
##FORMAT=<ID=AHAP,Number=1,Type=Integer,Description="Unique identifier of ancestral haplotype">
312312
\end{verbatim}
313313
\normalsize
314-
These keys are analogous to GT/GQ/GL and are provided for genotyping imprecise events by copy number (either because there is an unknown number of alternate alleles or because the haplotypes cannot be determined). CN specifies the integer copy number of the variant in this sample. CNQ is encoded as a phred quality $-10log_{10}$ p(copy number genotype call is wrong). CNL specifies a list of $log_{10}$ likelihoods for each potential copy number, starting from zero. When possible, GT/GQ/GL should be used instead of (or in addition to) these keys.
314+
These keys are analogous to GT/GQ/GL and are provided for genotyping imprecise events by copy number (either because there is an unknown number of alternate alleles or because the haplotypes cannot be determined). CN specifies the integer copy number of the variant in this sample. CNQ is encoded as a phred quality $-10\log_{10}$ p(copy number genotype call is wrong). CNL specifies a list of $\log_{10}$ likelihoods for each potential copy number, starting from zero. When possible, GT/GQ/GL should be used instead of (or in addition to) these keys.
315315

316316
\section{Representing variation in VCF records}
317317
\subsection{Creating VCF entries for SNPs and small indels}

VCFv4.3.tex

+6-6
Original file line numberDiff line numberDiff line change
@@ -322,8 +322,8 @@ \subsubsection{Fixed fields}
322322
In other words, the ALT field must be a symbolic allele, or a breakend replacement string, or match the regular expression \texttt{\^{}([ACGTNacgtn]+|\string\*|\string\.)\$}.
323323
Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive.
324324
(String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)
325-
\item QUAL --- quality: Phred-scaled quality score for the assertion made in ALT. i.e.\ $-10log_{10}$ prob(call in ALT is wrong).
326-
If ALT is `.' (no variant) then this is $-10log_{10}$ prob(variant), and if ALT is not `.' this is $-10log_{10}$ prob(no variant).
325+
\item QUAL --- quality: Phred-scaled quality score for the assertion made in ALT. i.e.\ $-10\log_{10}$ prob(call in ALT is wrong).
326+
If ALT is `.' (no variant) then this is $-10\log_{10}$ prob(variant), and if ALT is not `.' this is $-10\log_{10}$ prob(no variant).
327327
If unknown, the MISSING value must be specified. (Float)
328328
\item FILTER --- filter status: PASS if this position has passed all filters, i.e., a call is made at this position.
329329
Otherwise, if the site has not passed all filters, a semicolon-separated list of codes for filters that fail. e.g.\ ``q10;s50'' might indicate that at this site the quality is below 10 and the number of samples with data is below 50\% of the total number of samples.
@@ -443,7 +443,7 @@ \subsubsection{Genotype fields}
443443
Again, use PASS to indicate that all filters have been passed, a semicolon-separated list of codes for filters that fail, or `.' to indicate that filters have not been applied.
444444
These values should be described in the meta-information in the same way as FILTERs.
445445
No whitespace or semicolons permitted.
446-
\item GQ (Integer): Conditional genotype quality, encoded as a phred quality $-10log_{10}$ p(genotype call is wrong, conditioned on the site's being variant).
446+
\item GQ (Integer): Conditional genotype quality, encoded as a phred quality $-10\log_{10}$ p(genotype call is wrong, conditioned on the site's being variant).
447447
\item GP (Float): Genotype posterior probabilities in the range 0 to 1 using the same ordering as the GL field; one use can be to store imputed genotype probabilities.
448448
\item GT (String): Genotype, encoded as allele values separated by either of $/$ or $\mid$.
449449
The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on.
@@ -457,7 +457,7 @@ \subsubsection{Genotype fields}
457457
\item $\mid$ : genotype phased
458458
\end{itemize}
459459
460-
\item GL (Float): Genotype likelihoods comprised of comma separated floating point $log_{10}$-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields.
460+
\item GL (Float): Genotype likelihoods comprised of comma separated floating point $\log_{10}$-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields.
461461
In presence of the GT field the same ploidy is expected; without GT field, diploidy is assumed.
462462
463463
\textsc{Genotype Ordering.} \label{genotype-fields:genotype-ordering}
@@ -641,8 +641,8 @@ \section{FORMAT keys used for structural variants}
641641
\normalsize
642642
These keys are analogous to GT/GQ/GL/GP and are provided for genotyping imprecise events by copy number (either because there is an unknown number of alternate alleles or because the haplotypes cannot be determined).
643643
CN specifies the integer copy number of the variant in this sample.
644-
CNQ is encoded as a phred quality $-10log_{10}$ p(copy number genotype call is wrong).
645-
CNL specifies a list of $log_{10}$ likelihoods for each potential copy number, starting from zero.
644+
CNQ is encoded as a phred quality $-10\log_{10}$ p(copy number genotype call is wrong).
645+
CNL specifies a list of $\log_{10}$ likelihoods for each potential copy number, starting from zero.
646646
CNP is 0 to 1-scaled copy number posterior probabilities (and otherwise defined precisely as the CNL field), intended to store imputed genotype probabilities.
647647
When possible, GT/GQ/GL/GP should be used instead of (or in addition to) these keys.
648648

VCFv4.4.draft.tex

+6-6
Original file line numberDiff line numberDiff line change
@@ -327,8 +327,8 @@ \subsubsection{Fixed fields}
327327
In other words, the ALT field must be a symbolic allele, or a breakend replacement string, or match the regular expression \texttt{\^{}([ACGTNacgtn]+|\string\*|\string\.)\$}.
328328
Tools processing VCF files are not required to preserve case in the allele String, except for IDs, which are case sensitive.
329329
(String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)
330-
\item QUAL --- quality: Phred-scaled quality score for the assertion made in ALT. i.e.\ $-10log_{10}$ prob(call in ALT is wrong).
331-
If ALT is `.' (no variant) then this is $-10log_{10}$ prob(variant), and if ALT is not `.' this is $-10log_{10}$ prob(no variant).
330+
\item QUAL --- quality: Phred-scaled quality score for the assertion made in ALT. i.e.\ $-10\log_{10}$ prob(call in ALT is wrong).
331+
If ALT is `.' (no variant) then this is $-10\log_{10}$ prob(variant), and if ALT is not `.' this is $-10\log_{10}$ prob(no variant).
332332
If unknown, the MISSING value must be specified. (Float)
333333
\item FILTER --- filter status: PASS if this position has passed all filters, i.e., a call is made at this position.
334334
Otherwise, if the site has not passed all filters, a semicolon-separated list of codes for filters that fail. e.g.\ ``q10;s50'' might indicate that at this site the quality is below 10 and the number of samples with data is below 50\% of the total number of samples.
@@ -448,7 +448,7 @@ \subsubsection{Genotype fields}
448448
Again, use PASS to indicate that all filters have been passed, a semicolon-separated list of codes for filters that fail, or `.' to indicate that filters have not been applied.
449449
These values should be described in the meta-information in the same way as FILTERs.
450450
No whitespace or semicolons permitted.
451-
\item GQ (Integer): Conditional genotype quality, encoded as a phred quality $-10log_{10}$ p(genotype call is wrong, conditioned on the site's being variant).
451+
\item GQ (Integer): Conditional genotype quality, encoded as a phred quality $-10\log_{10}$ p(genotype call is wrong, conditioned on the site's being variant).
452452
\item GP (Float): Genotype posterior probabilities in the range 0 to 1 using the same ordering as the GL field; one use can be to store imputed genotype probabilities.
453453
\item GT (String): Genotype, encoded as allele values separated by either of $/$ or $\mid$.
454454
The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on.
@@ -462,7 +462,7 @@ \subsubsection{Genotype fields}
462462
\item $\mid$ : genotype phased
463463
\end{itemize}
464464
465-
\item GL (Float): Genotype likelihoods comprised of comma separated floating point $log_{10}$-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields.
465+
\item GL (Float): Genotype likelihoods comprised of comma separated floating point $\log_{10}$-scaled likelihoods for all possible genotypes given the set of alleles defined in the REF and ALT fields.
466466
In presence of the GT field the same ploidy is expected; without GT field, diploidy is assumed.
467467
468468
\textsc{Genotype Ordering.} \label{genotype-fields:genotype-ordering}
@@ -774,8 +774,8 @@ \section{FORMAT keys used for structural variants}
774774
\normalsize
775775
These keys are analogous to GT/GQ/GL/GP and are provided for genotyping imprecise events by copy number (either because there is an unknown number of alternate alleles or because the haplotypes cannot be determined).
776776
CN specifies the integer copy number of the variant in this sample.
777-
CNQ is encoded as a phred quality $-10log_{10}$ p(copy number genotype call is wrong).
778-
CNL specifies a list of $log_{10}$ likelihoods for each potential copy number, starting from zero.
777+
CNQ is encoded as a phred quality $-10\log_{10}$ p(copy number genotype call is wrong).
778+
CNL specifies a list of $\log_{10}$ likelihoods for each potential copy number, starting from zero.
779779
CNP is 0 to 1-scaled copy number posterior probabilities (and otherwise defined precisely as the CNL field), intended to store imputed genotype probabilities.
780780
When possible, GT/GQ/GL/GP should be used instead of (or in addition to) these keys.
781781

0 commit comments

Comments
 (0)