Gap Analysis for Determining K-mean Clustering

Myeong Lee

University of Maryland College Park (iSchool)

The code is to determine K in K-mean clustering using the gap analysis method. The original code was developed by DataScienceLab (https://datasciencelab.wordpress.com/2013/12/27/finding-the-k-in-k-means-clustering/).

Since the original code was (1) targeting only 2-tuple vectors; and (2) not maintaining vector IDs to track the data. My modified implementation tackled these two issues.

There are two sets of functions to include vector IDs: with and without a prefix "new_". If a function begins with "new_", that function is for maintaining IDs. If not, the fuction does not maintain vector IDs. The functions work well with n-dimensional vectors as well.

Feel free to use/modify the code. Any questions? ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
kmean.py		kmean.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gap Analysis for Determining K-mean Clustering

Myeong Lee

University of Maryland College Park (iSchool)

About

Releases

Packages

Languages

License

myeong/k-gap-analysis

Folders and files

Latest commit

History

Repository files navigation

Gap Analysis for Determining K-mean Clustering

Myeong Lee

University of Maryland College Park (iSchool)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages