@@ -129,4 +129,77 @@ The [plotClusters](plotClusters.m) function returns:
129
129
130
130
- ** h_out** - Figure handle of plot
131
131
132
+ ## Example
133
+
134
+ In this example, we demonstrate how to test AMVIDC using
135
+ [ Fisher's iris data] ( http://en.wikipedia.org/wiki/Iris_flower_data_set ) ,
136
+ which is included in the MatLab Statistics Toolbox. We chose this data set
137
+ as it is readily available, not necessarily because AMVIDC is the most
138
+ appropriate algorithm to apply in this case. First, we load the data:
139
+
140
+ >> load fisheriris
141
+
142
+ The data set consists of 150 samples, 50 samples for each of three
143
+ species of the Iris flower. Four features (variables) were measured per
144
+ sample. The data itself loads into the ` meas ` variable, while the
145
+ species to which each sample is associated is given in the ` species `
146
+ variable. The samples are ordered by species, so the first 50 samples
147
+ belong to one species, and so on. First, we test the
148
+ [ k-Means] ( http://en.wikipedia.org/wiki/K-means_clustering ) algorithm,
149
+ specifying three clusters, one per species:
150
+
151
+ >> idx_km = kmeans(meas, 3);
152
+
153
+ We can evaluate the performance of k-Means using the [ fscore] ( fscore.m )
154
+ function (the value of 1 being perfect clustering):
155
+
156
+ ```
157
+ >> fscore(idx_km, 3, [50, 50, 50])
158
+
159
+ ans =
160
+
161
+ 0.8918
162
+ ```
163
+
164
+ Visual observation can be accomplished with the [ plotClusters] ( plotClusters.m )
165
+ function. First, we will perform [ PCA] ( http://en.wikipedia.org/wiki/Principal_component_analysis )
166
+ on the data, which performs a transformation such that the we obtain its
167
+ two principal components (i.e., the components which have the largest possible
168
+ variance). These components are useful when plotting in 2D (even though
169
+ k-Means was performed on the four dimensions of the data).
170
+
171
+ >> [~, iris_pca] = princomp(meas);
172
+
173
+ We can now plot the data:
174
+
175
+ >> plotClusters(iris_pca, 2, [50,50,50], idx_km);
176
+ >> legend(unique(species), 'Location','Best')
177
+
178
+ ![ k-Means clustering of the Iris data set] ( images/kmeans.png " k-Means clustering of the Iris data set ")
179
+
180
+ AMVIDC is a computationally expensive algorithm, so it is preferable to
181
+ apply it on a reduced number of dimensions. The following command applies
182
+ AMVIDC clustering to the first two principal components of the data set,
183
+ using [ pddp] ( pddp.m ) for the initial clustering, ellipsoid volume
184
+ and direction change minimization:
185
+
186
+ >> idx_amvidc = clusterdata_amvidc(iris_pca(:, 1:2), 3, pddp(iris_pca(:, 1:2)), 'dirweight', 0.6, 'dirpower', 8, 'volume', 'ellipsoid');
187
+
188
+ The [ fscore] ( fscore.m ) evaluation is obtained as follows:
189
+
190
+ ```
191
+ >> fscore(idx_iris_a, 3, [50, 50, 50])
192
+
193
+ ans =
194
+
195
+ 0.9599
196
+ ```
197
+
198
+ Slightly better than k-Means. Visual inspection also provides a
199
+ good insight on the clustering result:
200
+
201
+ >> plotClusters(iris_pca,2,[50,50,50],idx_iris_a,'ellipsoid');
202
+ >> legend(unique(species), 'Location','Best');
203
+
204
+ ![ AMVIDC clustering of the Iris data set] ( images/amvidc.png " AMVIDC clustering of the Iris data set ")
132
205
0 commit comments