Skip to content

Commit be99de7

Browse files
committed
Add example in documentation
1 parent f2adef6 commit be99de7

File tree

4 files changed

+74
-1
lines changed

4 files changed

+74
-1
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
.~*#
33
.nfs*
44
*.mat
5-
*.png
65
*.fig
76
*.aux
87
*.log
@@ -11,4 +10,5 @@
1110
*.pdf
1211
*.gz
1312
*.ods
13+
*.eps
1414
synt/

README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,4 +129,77 @@ The [plotClusters](plotClusters.m) function returns:
129129

130130
- **h_out** - Figure handle of plot
131131

132+
## Example
133+
134+
In this example, we demonstrate how to test AMVIDC using
135+
[Fisher's iris data](http://en.wikipedia.org/wiki/Iris_flower_data_set),
136+
which is included in the MatLab Statistics Toolbox. We chose this data set
137+
as it is readily available, not necessarily because AMVIDC is the most
138+
appropriate algorithm to apply in this case. First, we load the data:
139+
140+
>> load fisheriris
141+
142+
The data set consists of 150 samples, 50 samples for each of three
143+
species of the Iris flower. Four features (variables) were measured per
144+
sample. The data itself loads into the `meas` variable, while the
145+
species to which each sample is associated is given in the `species`
146+
variable. The samples are ordered by species, so the first 50 samples
147+
belong to one species, and so on. First, we test the
148+
[k-Means](http://en.wikipedia.org/wiki/K-means_clustering) algorithm,
149+
specifying three clusters, one per species:
150+
151+
>> idx_km = kmeans(meas, 3);
152+
153+
We can evaluate the performance of k-Means using the [fscore](fscore.m)
154+
function (the value of 1 being perfect clustering):
155+
156+
```
157+
>> fscore(idx_km, 3, [50, 50, 50])
158+
159+
ans =
160+
161+
0.8918
162+
```
163+
164+
Visual observation can be accomplished with the [plotClusters](plotClusters.m)
165+
function. First, we will perform [PCA](http://en.wikipedia.org/wiki/Principal_component_analysis)
166+
on the data, which performs a transformation such that the we obtain its
167+
two principal components (i.e., the components which have the largest possible
168+
variance). These components are useful when plotting in 2D (even though
169+
k-Means was performed on the four dimensions of the data).
170+
171+
>> [~, iris_pca] = princomp(meas);
172+
173+
We can now plot the data:
174+
175+
>> plotClusters(iris_pca, 2, [50,50,50], idx_km);
176+
>> legend(unique(species), 'Location','Best')
177+
178+
![k-Means clustering of the Iris data set](images/kmeans.png "k-Means clustering of the Iris data set")
179+
180+
AMVIDC is a computationally expensive algorithm, so it is preferable to
181+
apply it on a reduced number of dimensions. The following command applies
182+
AMVIDC clustering to the first two principal components of the data set,
183+
using [pddp](pddp.m) for the initial clustering, ellipsoid volume
184+
and direction change minimization:
185+
186+
>> idx_amvidc = clusterdata_amvidc(iris_pca(:, 1:2), 3, pddp(iris_pca(:, 1:2)), 'dirweight', 0.6, 'dirpower', 8, 'volume', 'ellipsoid');
187+
188+
The [fscore](fscore.m) evaluation is obtained as follows:
189+
190+
```
191+
>> fscore(idx_iris_a, 3, [50, 50, 50])
192+
193+
ans =
194+
195+
0.9599
196+
```
197+
198+
Slightly better than k-Means. Visual inspection also provides a
199+
good insight on the clustering result:
200+
201+
>> plotClusters(iris_pca,2,[50,50,50],idx_iris_a,'ellipsoid');
202+
>> legend(unique(species), 'Location','Best');
203+
204+
![AMVIDC clustering of the Iris data set](images/amvidc.png "AMVIDC clustering of the Iris data set")
132205

images/amvidc.png

51.8 KB
Loading

images/kmeans.png

50.2 KB
Loading

0 commit comments

Comments
 (0)