You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR implements two-way LD statistics, specified between sample sets.
During the development of this functionality, a number of issues with
the designation of state_dims/result_dims were discovered. These have
been resolved, testing clean for existing code and providing the proper
behavior for this new code.
The mechanism by which users will specify a multi-population (or
two-way) statistic is by providing the `index` argument. This helps us
avoid creating another `ld_matrix` method for the TreeSequence object.
In other words, for a one-way statistic, a user would specify:
```python
ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]])
```
Which would output a 3D ndarray containing one LD matrix per sample set.
```python
ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1)])
```
Which would output a 2D ndarray containing one LD matrix for the index
pair. This would use our `D2_ij_summary_func`, instead of the
`D2_summary_func`. Finally, if a user provided
```python
ts.ld_matrix(stat="D2", sample_sets=[[ss1, ss2]], indexes=[(0, 1), (1, 1)])
```
We would output a 3D ndarray containing one LD matrix _per_ index pair
provided.
Since these are two-way statistics, the indexes must be length 2. We
plan on enabling users to implement k-way via a "general_stat" api. We did
not implement anything more than two-way statistics here because of the
combinatoric explosion of logic required for indexes > 2.
I added some basic tests to demonstrate that things were working
properly. If we compute two-way statistics on identical sample sets,
they should be equal to the one-way statistics. Unfortunately, this does
not apply to unbiased statistics, which I've validated manually.
I've also cleaned up the docstrings a bit and fixed a bug with the
D_prime statistic, which should not be weighted by haplotype frequency.
0 commit comments