-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Indices for Clustering. #11369
Comments
I don't think it is in our interest to implement and maintain numerous, rarely used metrics, when we're not able to also advise what the benefits or uses of them are. See related comments on inclusion criteria at our FAQ. I can also immediately see that you are missing several clustering metrics (many of which I know from the related coreference resolution evaluation literature) I'd be interested in potentially supporting:
|
Every feature we include has a maintenance cost. Our maintainers are mostly |
Score Function : works good for hyper-spheroidal data..s. It is shown to work well on multidimensional Davies–Bouldin index - validation of how well the clustering has been done is made using quantities and features inherent to the dataset. Dunn index -- ratio between the minimal intracluster distance to maximal intercluster distance Hartigan Index : generally used to find no. cluster in a dataset (used only for K-Means Algorithm) and about external indices , Entropy : degree to which each cluster contains objects of a single class. Kulczynski_index : arithmetic mean of the precision and recall coefficients. |
@cmarmo Thanks for the RFC. the best internal indices reference implementation is in https://github.com/Simon-Bertrand/Clusters-Features/blob/main/ClustersFeatures/src/_score_index.py Also for external indices, I have noted some other table for external evaluation (which is suitable for clustering and community detection) GiulioRossetti/cdlib#147 (comment) calling back to #1362 |
I have implemented cluster validation indices both internal and external Indices as part of my package.
I have 40 such indices which are tested and packaged as such into a Package called CRAVED. I would like to merge these indices as a part of SKlearns metrics for Clustering. Please let me how i can get started with this.
The text was updated successfully, but these errors were encountered: