Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] pseudo-p significance calculation #281

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions esda/significance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
import numpy as np
from scipy import stats

Check warning on line 2 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L1-L2

Added lines #L1 - L2 were not covered by tests


def calculate_significance(test_stat, reference_distribution, method="two-sided"):

Check warning on line 5 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L5

Added line #L5 was not covered by tests
"""
Calculate a pseudo p-value from a reference distribution.

Pseudo-p values are calculated using the formula (M + 1) / (R + 1). Where R is the number of simulations
and M is the number of times that the simulated value was equal to, or more extreme than the observed test statistic.
Comment on lines +9 to +10
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: describe the adjustment here


Parameters
----------
test_stat: float or numpy.ndarray
The observed test statistic, or a vector of observed test statistics
reference_distribution: numpy.ndarray
A numpy array containing simulated test statistics as a result of conditional permutation.
method: string
One of 'two-sided', 'lesser', or 'greater'. Indicates the alternative hypothesis.
- 'two-sided': the observed test-statistic is an extreme value of the reference distribution.
- 'lesser': the observed test-statistic is small relative to the reference distribution.
- 'greater': the observed test-statistic is large relative to the reference distribution.
- 'directed': the observed test statistic is either small or large reltaive to the reference distribution.
Comment on lines +19 to +23
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This states that the valid arguments are 'two-sided', 'lesser', or 'greater' however there is also a directed argument documented as well—though it is unclear to me how directed and two-sided are different.


Notes
-----

the directed p-value is half of the two-sided p-value, and corresponds to running the
lesser and greater tests, then picking the smaller significance value. This is not advised.
Comment on lines +28 to +29
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will be untrue if the adjustment is added

"""
reference_distribution = np.atleast_2d(reference_distribution)
n_samples, p_permutations = reference_distribution.shape
test_stat = np.atleast_2d(test_stat).reshape(n_samples, -1)

Check warning on line 33 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L31-L33

Added lines #L31 - L33 were not covered by tests
if method == "directed":
larger = (reference_distribution >= test_stat).sum(axis=1)
low_extreme = (p_permutations - larger) < larger
larger[low_extreme] = p_permutations - larger[low_extreme]
p_value = (larger + 1.0) / (p_permutations + 1.0)

Check warning on line 38 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L35-L38

Added lines #L35 - L38 were not covered by tests
elif method == "lesser":
p_value = (np.sum(reference_distribution <= test_stat, axis=1) + 1) / (

Check warning on line 40 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L40

Added line #L40 was not covered by tests
p_permutations + 1
)
elif method == "greater":
p_value = (np.sum(reference_distribution >= test_stat, axis=1) + 1) / (

Check warning on line 44 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L44

Added line #L44 was not covered by tests
p_permutations + 1
)
elif method == "two-sided":
percentile = (reference_distribution < test_stat).mean(axis=1)
bounds = np.column_stack((1 - percentile, percentile)) * 100
bounds.sort(axis=1)

Check warning on line 50 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L48-L50

Added lines #L48 - L50 were not covered by tests
lows, highs = np.row_stack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be able to be vectorised, but I couldn't quickly figure that out. the following did not generate the same results as below:

stats.scoreatpercentile(reference_distribution, bounds, axis=1)

[
stats.scoreatpercentile(r, per=p)
for r, p in zip(reference_distribution, bounds)
]
).T
n_outside = (reference_distribution < lows[:, None]).sum(axis=1)
n_outside += (reference_distribution > highs[:, None]).sum(axis=1)
p_value = (n_outside + 1) / (p_permutations + 1)

Check warning on line 59 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L57-L59

Added lines #L57 - L59 were not covered by tests
elif method == "folded":
means = reference_distribution.mean(axis=1, keepdims=True)
test_stat = np.abs(test_stat - means)
reference_distribution = np.abs(reference_distribution - means)
p_value = ((reference_distribution >= test_stat).sum(axis=1) + 1) / (

Check warning on line 64 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L61-L64

Added lines #L61 - L64 were not covered by tests
p_permutations + 1
)
else:
raise ValueError(

Check warning on line 68 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L68

Added line #L68 was not covered by tests
f"Unknown p-value method: {method}. Generally, 'two-sided' is a good default!"
)
return p_value

Check warning on line 71 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L71

Added line #L71 was not covered by tests


if __name__ == "__main__":
import numpy
import esda
import pandas
from libpysal.weights import Voronoi

Check warning on line 78 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L75-L78

Added lines #L75 - L78 were not covered by tests

coordinates = numpy.random.random(size=(2000, 2))
x = numpy.random.normal(size=(2000,))
w = Voronoi(coordinates, clip="bbox")
w.transform = "r"
stat = esda.Moran_Local(x, w)

Check warning on line 84 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L80-L84

Added lines #L80 - L84 were not covered by tests

ts = calculate_significance(stat.Is, stat.rlisas, method="two-sided")
di = calculate_significance(stat.Is, stat.rlisas, method="directed")
lt = calculate_significance(stat.Is, stat.rlisas, method="lesser")
gt = calculate_significance(stat.Is, stat.rlisas, method="greater")
fo = calculate_significance(stat.Is, stat.rlisas, method="folded")

Check warning on line 90 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L86-L90

Added lines #L86 - L90 were not covered by tests

numpy.testing.assert_array_equal(

Check warning on line 92 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L92

Added line #L92 was not covered by tests
numpy.minimum(lt, gt), di
) # di is just the minimum of the two tests

print(

Check warning on line 96 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L96

Added line #L96 was not covered by tests
f"directed * 2 is the same as two-sided {(di*2 == ts).mean()*100}% of the time"
)

print(

Check warning on line 100 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L100

Added line #L100 was not covered by tests
pandas.DataFrame(
numpy.column_stack((ts, di, fo, lt, gt)),
columns=["two-sided", "directed", "folded", "lt", "gt"],
).corr()
)

answer = input("run big simulation? [y/n]")

Check warning on line 107 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L107

Added line #L107 was not covered by tests
if answer.lower().startswith("y"):
all_correlations = []

Check warning on line 109 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L109

Added line #L109 was not covered by tests
for i in range(1000):
x = numpy.random.normal(size=(2000,))
stat = esda.Moran_Local(x, w)
ts = calculate_significance(stat.Is, stat.rlisas, method="two-sided")
di = calculate_significance(stat.Is, stat.rlisas, method="directed")
lt = calculate_significance(stat.Is, stat.rlisas, method="lesser")
gt = calculate_significance(stat.Is, stat.rlisas, method="greater")
fo = calculate_significance(stat.Is, stat.rlisas, method="folded")
corrs = (

Check warning on line 118 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L111-L118

Added lines #L111 - L118 were not covered by tests
pandas.DataFrame(
numpy.column_stack((ts, di, fo, lt, gt)),
columns=["two-sided", "directed", "folded", "lt", "gt"],
)
.corr()
.assign(repno=i)
)
all_correlations.append(corrs)
all_correlations = pandas.concat(all_correlations)

Check warning on line 127 in esda/significance.py

View check run for this annotation

Codecov / codecov/patch

esda/significance.py#L126-L127

Added lines #L126 - L127 were not covered by tests
Loading