-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathfdr.py
199 lines (181 loc) · 7.64 KB
/
fdr.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# $\newcommand{L}[1]{\| #1 \|}\newcommand{VL}[1]{\L{ \vec{#1} }}\newcommand{R}[1]{\operatorname{Re}\,(#1)}\newcommand{I}[1]{\operatorname{Im}\, (#1)}$
#
# ## Thresholding with false discovery rate
#
# Written with J-B Poline.
#
# The false discovery rate is a different *type* of correction than
# family-wise correction. Instead of controlling for the risk of *any
# tests* falsely being declared significant under the null hypothesis, FDR
# will control the *number of tests falsely declared significant as a
# proportion of the number of all tests declared significant*.
#
# A basic idea on how the FDR works is the following.
#
# We have got a large number of p values from a set of individual tests.
# These might be p values from tests on a set of brain voxels.
#
# We are trying to a find a p value threshold $\theta$ to do a
# reasonable job of distinguishing true positive tests from true
# negatives. p values that are less than or equal to $\theta$ are
# *detections* and $\theta$ is a *detection threshold*.
#
# We want to choose a detection threshold that will only allow a small
# number of false positive detections.
#
# A *detection* can also be called a *discovery*; hence false discovery
# rate.
#
# For the FDR, we will try to find a p value within the family of tests
# (the set of p values), that we can use as a detection threshold.
#
# Let’s look at the p value for a particular test. Let’s say there are
# $N$ tests, indexed with $i \in 1 .. N$. We look at a test
# $i$, and consider using p value from this test as a detection
# threshold; $\theta = p(i)$. The expected number of false positives
# (FP) in N tests at this detection threshold would be:
#
# $$
# E(FP) = N p(i)
# $$
#
# For example, if we had 100 tests, and the particular p value
# $p(i)$ was 0.1, then the expected number of false positive
# detections, thresholding at 0.1, is 0.1 \* 100 = 10.
#
# Let’s take some data from a random normal distribution to illustrate:
from __future__ import print_function, division
import numpy as np
import matplotlib.pyplot as plt
# If running in the IPython console, consider running `%matplotlib` to enable
# interactive plots. If running in the Jupyter Notebook, use `%matplotlib
# inline`.
np.random.seed(42) # so we always get the same random numbers
N = 100
z_values = np.random.normal(size=N)
# Turn the Z values into p values:
import scipy.stats as sst
normal_distribution = sst.norm(loc=0,scale=1.) #loc is the mean, scale is the variance.
# The normal CDF
p_values = normal_distribution.cdf(z_values)
# To make it easier to show, we sort the p values from smallest to
# largest:
p_values = np.sort(p_values)
i = np.arange(1, N+1) # the 1-based i index of the p values, as in p(i)
plt.plot(i, p_values, '.')
plt.xlabel('$i$')
plt.ylabel('p value')
# Notice the (more or less) straight line of p value against $i$
# index in this case, where there is no signal in the random noise.
#
# We want to find a p value threshold $p(i)$ where there is only a
# small *proportion* of false positives among the detections. For example,
# we might accept a threshold such that 5% of all detections (discoveries)
# are likely to be false positives. If $d$ is the number of
# discoveries at threshold $\theta$, and $q$ is the proportion
# of false positives we will accept (e.g. 0.05), then we want a threshold
# $\theta$ such that $E(FP) / d < q$ where $E(x)$ is the
# expectation of $x$, here the number of FP I would get *on average*
# if I was to repeat my experiment many times.
#
# So - what is $d$ in the plot above? Now that we have ordered the p
# values, for any index $i$, if we threshold at
# $\theta \le p(i)$ we will have $i$ detections
# ($d = i$). Therefore we want to find the largest $p(i)$ such
# that $E(FP) / i < q$. We know $E(FP) = N p(i)$ so we want
# the largest $p(i)$ such that:
#
# $$
# N p(i) / i < q \implies p(i) < q i / N
# $$
#
# Let’s take $q$ (the proportion of false discoveries = detections)
# as 0.05. We plot $q i / N$ (in red) on the same graph as
# $p(i)$ (in blue):
q = 0.05
plt.plot(i, p_values, 'b.', label='$p(i)$')
plt.plot(i, q * i / N, 'r', label='$q i / N$')
plt.xlabel('$i$')
plt.ylabel('$p$')
plt.legend()
# Our job is to look for the largest $p(i)$ value (blue dot) that is
# still underneath $q i / N$ (the red line).
#
# The red line $q i / N$ is the acceptable number of false positives
# $q i$ as a proportion of all the tests $N$. Further to the
# right on the red line corresponds to a larger acceptable number of false
# positives. For example, for $i = 1$, the acceptable number of
# false positives $q * i$ is $0.05 * 1$, but at
# $i = 50$, the acceptable number of expected false positives
# $q * i$ is $0.05 * 50 = 2.5$.
#
# Notice that, if only the first p value passes threshold, then
# $p(1) < q \space 1 \space / \space N$. So, if $q = 0.05$,
# $p(1) < 0.05 / N$. This is the Bonferroni correction for $N$
# tests.
#
# The FDR becomes more interesting when there is signal in the noise. In
# this case there will be p values that are smaller than expected on the
# null hypothesis. This causes the p value line to start below the
# diagonal on the ordered plot, because of the high density of low p
# values.
N_signal = 20
N_noise = N - N_signal
noise_z_values = np.random.normal(size=N_noise)
# Add some signal with very low z scores / p values
signal_z_values = np.random.normal(loc=-2.5, size=N_signal)
mixed_z_values = np.sort(np.concatenate((noise_z_values, signal_z_values)))
mixed_p_values = normal_distribution.cdf(mixed_z_values)
plt.plot(i, mixed_p_values, 'b.', label='$p(i)$')
plt.plot(i, q * i / N, 'r', label='$q i / N$')
plt.xlabel('$i$')
plt.ylabel('$p$')
plt.legend()
# The interesting part is the beginning of the graph, where the blue p
# values stay below the red line:
first_i = i[:30]
plt.plot(first_i, mixed_p_values[:30], 'b.', label='$p(i)$')
plt.plot(first_i, q * first_i / N, 'r', label='$q i / N$')
plt.xlabel('$i$')
plt.ylabel('$p$')
plt.legend()
# We are looking for the largest $p(i) < qi/N$, which corresponds to
# the last blue point below the red line.
below = mixed_p_values < (q * i / N) # True where p(i)<qi/N
max_below = np.max(np.where(below)[0]) # Max Python array index where p(i)<qi/N
print('p_i:', mixed_p_values[max_below])
print('i:', max_below + 1) # Python indices 0-based, we want 1-based
# The Bonferroni threshold is:
0.05 / N
# In this case, where there is signal in the noise, the FDR threshold
# *adapts* to the presence of the signal, by taking into account that some
# values have small enough p values that they can be assumed to be signal,
# so that there are fewer noise comparisons to correct for, and the
# threshold is correspondingly less stringent.
#
# As the FDR threshold becomes less stringent, the number of detections
# increases, and the expected number of false positive detections
# increases, because the FDR controls the *proportion* of false positives
# in the detections. In our case, the expected number of false positives
# in the detections is $q i = 0.05 * 9 = 0.45$. In other words, at
# this threshold, we have a 45% chance of seeing a false positive among
# the detected positive tests.
#
# So, there are a number of interesting properties of the FDR - and some
# not so interesting if you want to do brain imaging.
#
# * In the case of no signal at all, the FDR threshold will be the
# Bonferroni threshold
#
# * Under some conditions (see Benjamini and Hochberg, JRSS-B 1995), the
# FDR threshold can be applied to correlated data
#
# * FDR is an “adaptive” threshold
#
# Not so “interesting”
#
# * FDR can be very variable
#
# * When there are lots of true positives, and many detections, the
# number of false positive detections increases. This can make FDR
# detections more difficult to interpret.