Skip to content

Commit 49c5e6a

Browse files
committed
Add filtering/annotation support for Fishers exact test
There are multiple ways how to query and annotate, for example as bcftools +fill-tags test.vcf -- -t 'INFO/FT=phred(fisher(INFO/DP4))' Resolves #1582
1 parent cef68bc commit 49c5e6a

15 files changed

+371
-22
lines changed

NEWS

+9-2
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@ Changes affecting the whole of bcftools, or multiple commands:
44

55
* Add support for matching lines by ID via the --pair-logic and --collapse options (#1739)
66

7-
* The -i/-e filtering expressions now properly match the regex negation of missing
8-
values, e.g. -i 'TAG!~"\."' (#2355)
7+
* The -i/-e filtering expressions
8+
9+
- The expressions now properly match the regex negation of missing values, e.g. -i 'TAG!~"\."' (#2355)
910

11+
- Added support for Fisher's exact test
1012

1113
Changes affecting specific commands:
1214

@@ -32,6 +34,11 @@ Changes affecting specific commands:
3234
- Check the input GFF for features outside transcript boundaries and extend the transcript
3335
to contain the feature fully (#2323)
3436

37+
* bcftools +fill-tags
38+
39+
- Thanks to the extension of filtering expressions with Fisher's exact test, the plugin
40+
can now be used to add FT annotation (#1582)
41+
3542
* bcftools merge
3643

3744
- Preserve phasing in half-missing genotypes (#2331)

doc/bcftools.1

+16-5
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
.\" Title: bcftools
33
.\" Author: [see the "AUTHOR(S)" section]
44
.\" Generator: Asciidoctor 2.0.20
5-
.\" Date: 2025-01-29
5+
.\" Date: 2025-02-27
66
.\" Manual: \ \&
77
.\" Source: \ \&
88
.\" Language: English
99
.\"
10-
.TH "BCFTOOLS" "1" "2025-01-29" "\ \&" "\ \&"
10+
.TH "BCFTOOLS" "1" "2025-02-27" "\ \&" "\ \&"
1111
.ie \n(.g .ds Aq \(aq
1212
.el .ds Aq '
1313
.ss \n[.ss] 0
@@ -51,7 +51,7 @@ standard input (stdin) and outputs to the standard output (stdout). Several
5151
commands can thus be combined with Unix pipes.
5252
.SS "VERSION"
5353
.sp
54-
This manual page was last updated \fB2025\-01\-29 08:37 CET\fP and refers to bcftools git version \fB1.21\-72\-g724713f+\fP.
54+
This manual page was last updated \fB2025\-02\-27 12:33 CET\fP and refers to bcftools git version \fB1.21\-79\-gcef68bc+\fP.
5555
.SS "BCF1"
5656
.sp
5757
The obsolete BCF1 format output by versions of samtools <= 0.1.19 is \fBnot\fP
@@ -6344,8 +6344,8 @@ sMAX, sMIN, sAVG, sMEAN, sMEDIAN, sSTDEV, sSUM
63446344
. sp -1
63456345
. IP \(bu 2.3
63466346
.\}
6347-
two\-tailed binomial test. Note that for N=0 the test evaluates to a missing value
6348-
and when FORMAT/GT is used to determine the vector indices, it evaluates to 1 for
6347+
two\-tailed binomial and fisher test. Note that for N=0 the test evaluates to a missing
6348+
value and when FORMAT/GT is used to determine the vector indices, it evaluates to 1 for
63496349
homozygous genotypes.
63506350
.sp
63516351
.if n .RS 4
@@ -6357,6 +6357,17 @@ phred(binom()) .. the same as binom but phred\-scaled
63576357
.fam
63586358
.fi
63596359
.if n .RE
6360+
.sp
6361+
.if n .RS 4
6362+
.nf
6363+
.fam C
6364+
fisher(INFO/DP4)
6365+
fisher(FORMAT/DP4)
6366+
fisher(FMT/ADF,FMT/ADR) .. GT can be used to determine the correct indices
6367+
fisher(FMT/ADF[:0,1],FMT/ADR[:0,1]) .. or the fields can be given explicitly
6368+
.fam
6369+
.fi
6370+
.if n .RE
63606371
.RE
63616372
.sp
63626373
.RS 4

doc/bcftools.html

+12-4
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ <h2 id="_description">DESCRIPTION</h2>
5050
<div class="sect2">
5151
<h3 id="_version">VERSION</h3>
5252
<div class="paragraph">
53-
<p>This manual page was last updated <strong>2025-01-29 08:37 CET</strong> and refers to bcftools git version <strong>1.21-72-g724713f+</strong>.</p>
53+
<p>This manual page was last updated <strong>2025-02-27 12:33 CET</strong> and refers to bcftools git version <strong>1.21-79-gcef68bc+</strong>.</p>
5454
</div>
5555
</div>
5656
<div class="sect2">
@@ -5409,8 +5409,8 @@ <h2 id="expressions">FILTERING EXPRESSIONS</h2>
54095409
</div>
54105410
</li>
54115411
<li>
5412-
<p>two-tailed binomial test. Note that for N=0 the test evaluates to a missing value
5413-
and when FORMAT/GT is used to determine the vector indices, it evaluates to 1 for
5412+
<p>two-tailed binomial and fisher test. Note that for N=0 the test evaluates to a missing
5413+
value and when FORMAT/GT is used to determine the vector indices, it evaluates to 1 for
54145414
homozygous genotypes.</p>
54155415
<div class="literalblock">
54165416
<div class="content">
@@ -5419,6 +5419,14 @@ <h2 id="expressions">FILTERING EXPRESSIONS</h2>
54195419
phred(binom()) .. the same as binom but phred-scaled</pre>
54205420
</div>
54215421
</div>
5422+
<div class="literalblock">
5423+
<div class="content">
5424+
<pre>fisher(INFO/DP4)
5425+
fisher(FORMAT/DP4)
5426+
fisher(FMT/ADF,FMT/ADR) .. GT can be used to determine the correct indices
5427+
fisher(FMT/ADF[:0,1],FMT/ADR[:0,1]) .. or the fields can be given explicitly</pre>
5428+
</div>
5429+
</div>
54225430
</li>
54235431
<li>
54245432
<p>variables calculated on the fly if not present: number of alternate alleles;
@@ -5723,7 +5731,7 @@ <h2 id="_copying">COPYING</h2>
57235731
</div>
57245732
<div id="footer">
57255733
<div id="footer-text">
5726-
Last updated 2025-01-29 08:37:53 +0100
5734+
Last updated 2025-02-27 12:28:57 +0100
57275735
</div>
57285736
</div>
57295737
</body>

doc/bcftools.txt

+7-2
Original file line numberDiff line numberDiff line change
@@ -4019,14 +4019,19 @@ they will evaluate to a vector of per-sample values when applied on FORMAT tags:
40194019
SMPL_MAX, SMPL_MIN, SMPL_AVG, SMPL_MEAN, SMPL_MEDIAN, SMPL_STDEV, SMPL_SUM,
40204020
sMAX, sMIN, sAVG, sMEAN, sMEDIAN, sSTDEV, sSUM
40214021

4022-
* two-tailed binomial test. Note that for N=0 the test evaluates to a missing value
4023-
and when FORMAT/GT is used to determine the vector indices, it evaluates to 1 for
4022+
* two-tailed binomial and fisher test. Note that for N=0 the test evaluates to a missing
4023+
value and when FORMAT/GT is used to determine the vector indices, it evaluates to 1 for
40244024
homozygous genotypes.
40254025

40264026
binom(FMT/AD) .. GT can be used to determine the correct index
40274027
binom(AD[0],AD[1]) .. or the fields can be given explicitly
40284028
phred(binom()) .. the same as binom but phred-scaled
40294029

4030+
fisher(INFO/DP4)
4031+
fisher(FORMAT/DP4)
4032+
fisher(FMT/ADF,FMT/ADR) .. GT can be used to determine the correct indices
4033+
fisher(FMT/ADF[:0,1],FMT/ADR[:0,1]) .. or the fields can be given explicitly
4034+
40304035
* variables calculated on the fly if not present: number of alternate alleles;
40314036
number of samples; count of alternate alleles; minor allele count (similar to
40324037
AC but always picks the allele with frequency smaller than 0.5); frequency of alternate alleles (AF=AC/AN);

0 commit comments

Comments
 (0)