Skip to content

Commit 2fd234e

Browse files
committed
replace kmeans() with kmeans2()
This might be a solution for issue #10
1 parent 0e5b69d commit 2fd234e

File tree

3 files changed

+12
-4
lines changed

3 files changed

+12
-4
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
# 0.0.6 - 2022-02-02
2+
3+
- Replace `scipy.cluster.vq.kmeans` with `scipy.cluster.vq.kmeans2` to address
4+
issue #10 where we learned that kmeans does not always return k centroids,
5+
but kmeans2 does return k centroids. Thanks to @onionpork and @DennisPost10
6+
for reporting this.
7+
18
# 0.0.5 - 2020-08-11
29

310
- Expose `max_iter_harmony` as a new top-level argument, in addition to the

harmonypy/harmony.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@
1717

1818
import pandas as pd
1919
import numpy as np
20-
from scipy.cluster.vq import kmeans
20+
# kmeans does not always return k centroids, but kmeans2 does
21+
from scipy.cluster.vq import kmeans2
2122
import logging
2223

2324
# create logger
@@ -185,8 +186,8 @@ def allocate_buffers(self):
185186

186187
def init_cluster(self):
187188
# Start with cluster centroids
188-
km = kmeans(self.Z_cos.T, self.K, iter=10)
189-
self.Y = km[0].T
189+
km_centroids, km_labels = kmeans2(self.Z_cos.T, self.K, minit='++')
190+
self.Y = km_centroids.T
190191
# (1) Normalize
191192
self.Y = self.Y / np.linalg.norm(self.Y, ord=2, axis=0)
192193
# (2) Assign cluster probabilities

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setuptools.setup(
77
name = "harmonypy",
8-
version = "0.0.5",
8+
version = "0.0.6",
99
author = "Kamil Slowikowski",
1010
author_email = "[email protected]",
1111
description = "A data integration algorithm.",

0 commit comments

Comments
 (0)