-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initialization is using euclidean distance #18
Comments
Thank you, I'll look into it. |
May I ask what the status of this issue is? I looked into the commits log but to no avail. Many thanks! |
Hi @rtrad89 or @t103z I'm not sure how much this will actually matter since this is just used for intiailization (and sklearn has chosen not to provide a distance-configurable version of kmeans++). The code for the initialization is here: https://github.com/scikit-learn/scikit-learn/blob/0.22.X/sklearn/cluster/_k_means.py#L41, do you want to take a stab at a PR to port it for cosine dist? |
spherecluster/spherecluster/spherical_kmeans.py
Lines 44 to 48 in 701b0b1
I might be getting this wrong, but the code here seems to be using initialization function from sklearn. This could cause issue since the kmeans++ initialization in sklearn is based on euclidean distance. It should be replaced with cosine distance.
The text was updated successfully, but these errors were encountered: