Skip to content

Commit 9f8222e

Browse files
committed
Two-way two-locus statistics (python proto)
This PR implements two-way LD statistics, specified between sample sets. During the development of this functionality, a number of issues with the designation of state_dims/result_dims were discovered. These have been resolved, testing clean for existing code and providing the proper behavior for this new code. The mechanism by which users will specify a multi-population (or two-way) statistic is by providing the `index` argument. This helps us avoid creating another `ld_matrix` method for the TreeSequence object. In other words, for a one-way statistic, a user would specify: ```python ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]]) ``` Which would output a 3D ndarray containing one LD matrix per sample set. ```python ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1)]) ``` Which would output a 2D ndarray containing one LD matrix for the index pair. This would use our `D2_ij_summary_func`, instead of the `D2_summary_func`. Finally, if a user provided ```python ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1), (1, 1)]) ``` We would output a 3D ndarray containing one LD matrix _per_ index pair provided. Since these are two-way statistics, the indexes must be length 2. We plan on enabling users to implement k-way via a "general_stat" api. We did not implement anything more than two-way statistics here because of the combinatoric explosion of logic required for indexes > 2. I added some basic tests to demonstrate that things were working properly. If we compute two-way statistics on identical sample sets, they should be equal to the one-way statistics. Unfortunately, this does not apply to unbiased statistics, which I've validated manually. I've also cleaned up the docstrings a bit and fixed a bug with the D_prime statistic, which should not be weighted by haplotype frequency.
1 parent c86884d commit 9f8222e

File tree

3 files changed

+412
-75
lines changed

3 files changed

+412
-75
lines changed

c/tests/test_stats.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2870,8 +2870,7 @@ test_two_site_correlated_multiallelic(void)
28702870
double truth_D2[4] = { 0.023844603634269844, 0.02384460363426984,
28712871
0.02384460363426984, 0.02384460363426984 };
28722872
double truth_r2[4] = { 1, 1, 1, 1 };
2873-
double truth_D_prime[4] = { 0.7777777777777777, 0.4444444444444444,
2874-
0.4444444444444444, 0.6666666666666666 };
2873+
double truth_D_prime[4] = { 0, -0.5, -0.5, 0 };
28752874
double truth_r[4] = { 0.18377223398316206, -0.12212786219416509,
28762875
-0.12212786219416509, 0.2609542781331212 };
28772876
double truth_Dz[4] = { 0.0033870175616860566, 0.003387017561686057,
@@ -3003,7 +3002,7 @@ test_two_site_uncorrelated_multiallelic(void)
30033002
double truth_D[4] = { 0.05555555555555555, 0.0, 0.0, 0.05555555555555555 };
30043003
double truth_D2[4] = { 0.024691358024691357, 0.0, 0.0, 0.024691358024691357 };
30053004
double truth_r2[4] = { 1, 0, 0, 1 };
3006-
double truth_D_prime[4] = { 0.6666666666666665, 0.0, 0.0, 0.6666666666666665 };
3005+
double truth_D_prime[4] = { -2.7755575615628914e-17, 0, 0, -2.7755575615628914e-17 };
30073006
double truth_r[4] = { 0.24999999999999997, 0.0, 0.0, 0.24999999999999997 };
30083007
double truth_Dz[4] = { 0.0, 0.0, 0.0, 0.0 };
30093008
double truth_pi2[4] = { 0.04938271604938272, 0.04938271604938272,

c/tskit/trees.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4231,7 +4231,7 @@ tsk_treeseq_D_prime(const tsk_treeseq_t *self, tsk_size_t num_sample_sets,
42314231
{
42324232
options |= TSK_STAT_POLARISED; // TODO: allow user to pick?
42334233
return tsk_treeseq_two_locus_count_stat(self, num_sample_sets, sample_set_sizes,
4234-
sample_sets, num_sample_sets, NULL, D_prime_summary_func, norm_hap_weighted,
4234+
sample_sets, num_sample_sets, NULL, D_prime_summary_func, norm_total_weighted,
42354235
num_rows, row_sites, row_positions, num_cols, col_sites, col_positions, options,
42364236
result);
42374237
}

0 commit comments

Comments
 (0)