ML-K-means-clustering-algorithm-and-models

INTRODUCTION

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.\

The k-means clustering algorithm mainly performs two tasks:

Firstly randomly decide k centers .
Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster.
Find the centroid for each cluster and reapeat till it becomes stable.
No of clusters can be decided by elbow
We iterate the values of k from 1 to 9 and calculate the values of distortions for each value of k and calculate the distortion and inertia for each value of k in the given range.
Distortion: It is calculated as the average of the squared distances from the cluster centers of the respective clusters. Typically, the Euclidean distance metric is used.
Inertia: It is the sum of squared distances of samples to their closest cluster center.
if we plot wccs vs k we will find an elbow which is our no of clusters.

the algorithm works like below steps
Here the "E-step" or "Expectation step" is so-named because it involves updating our expectation of which cluster each point belongs to. The "M-step" or "Maximization step" is so-named because it involves maximizing some fitness function that defines the location of the cluster centers

When should we use K-means clustering...?

It should be used to find the clusters in the huge data set.

PROS

It can used for clustering.
It can be used clasiification.
It can be used colour image compression.
It can be used for image segmentation.

CONS

k-means is limited to linear cluster boundaries , so we use neariest neighbour too in this algo.
k-means can be slow for large numbers of samples
clusters depend on the intializing positions of the centroids so,we use k-means++ to eliminate this.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
K-mean clustering customers		K-mean clustering customers
K-means-color_compression		K-means-color_compression
Non-linear_clustering_example		Non-linear_clustering_example
Therotical_application_example		Therotical_application_example
kmeans digit classification		kmeans digit classification
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-K-means-clustering-algorithm-and-models

INTRODUCTION

When should we use K-means clustering...?

PROS

CONS

About

Releases

Packages

Languages

charankamarapu/ML-K-means-clustering-algorithm-and-models

Folders and files

Latest commit

History

Repository files navigation

ML-K-means-clustering-algorithm-and-models

INTRODUCTION

When should we use K-means clustering...?

PROS

CONS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages