Skip to content

Commit b77bc1b

Browse files
committed
Two-way two-locus statistics (python proto)
This PR implements two-way LD statistics, specified between sample sets. During the development of this functionality, a number of issues with the designation of state_dims/result_dims were discovered. These have been resolved, testing clean for existing code and providing the proper behavior for this new code. The mechanism by which users will specify a multi-population (or two-way) statistic is by providing the `index` argument. This helps us avoid creating another `ld_matrix` method for the TreeSequence object. In other words, for a one-way statistic, a user would specify: ```python ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]]) ``` Which would output a 3D ndarray containing one LD matrix per sample set. ```python ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1)]) ``` Which would output a 2D ndarray containing one LD matrix for the index pair. This would use our `D2_ij_summary_func`, instead of the `D2_summary_func`. Finally, if a user provided ```python ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1), (1, 1)]) ``` We would output a 3D ndarray containing one LD matrix _per_ index pair provided. Since these are two-way statistics, the indexes must be length 2. We plan on enabling users to implement k-way via a "general_stat" api. We did not implement anything more than two-way statistics here because of the combinatoric explosion of logic required for indexes > 2. I added some basic tests to demonstrate that things were working properly. If we compute two-way statistics on identical sample sets, they should be equal to the one-way statistics. Unfortunately, this does not apply to unbiased statistics, which I've validated manually. I've also cleaned up the docstrings a bit and fixed a bug with the D_prime statistic, which should not be weighted by haplotype frequency.
1 parent c86884d commit b77bc1b

File tree

2 files changed

+410
-72
lines changed

2 files changed

+410
-72
lines changed

c/tskit/trees.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4231,7 +4231,7 @@ tsk_treeseq_D_prime(const tsk_treeseq_t *self, tsk_size_t num_sample_sets,
42314231
{
42324232
options |= TSK_STAT_POLARISED; // TODO: allow user to pick?
42334233
return tsk_treeseq_two_locus_count_stat(self, num_sample_sets, sample_set_sizes,
4234-
sample_sets, num_sample_sets, NULL, D_prime_summary_func, norm_hap_weighted,
4234+
sample_sets, num_sample_sets, NULL, D_prime_summary_func, norm_total_weighted,
42354235
num_rows, row_sites, row_positions, num_cols, col_sites, col_positions, options,
42364236
result);
42374237
}

0 commit comments

Comments
 (0)