The program takes the data file as a command line argument. Choice of k-means or c-means is made interactively, as well as r value and number of clusters (k). The chosen algorithm is run r times, and for each a solution is returned. The solution with the minimum within-cluster sum of squares (WCSS) value is chosen and displayed at the end. The other solutions are displayed in a plot for each iteration, when that iteration value is divisible by the ‘interval’ value (in order to decrease memory usage when the algorithms approach 100 iterations).
The following plots show the progress of the algorithm through different iterations. However, since these were all run with r = 10, they are not guaranteed to be from the same sequence (e.g. iteration 7 may not be contiguous with iteration 4 as they are from different r-runs). In any case, it shows the progress of the minimization of the WCSS over the iterations.
As shown in the plots, the number of clusters was inversely correlated with the best WCSS values, so that the largest number of clusters attempted, 7, had the best WCSS value at ~550.
As shown in the plots, the number of clusters was inversely correlated with the best WCSS values, so that the largest number of clusters attempted, 7, had the best WCSS value at ~640. Interestingly, the fuzzy c-means algorithm had consistently higher/worse WCSS values for the same number of clusters compared to the k-means algorithm.