LaSEEB
diff --git a/‎+mve/MinVolEllipse.m
Lines changed: 94 additions & 0 deletions b/‎+mve/MinVolEllipse.m
Lines changed: 94 additions & 0 deletions
diff --git a/‎+mve/license.txt
Lines changed: 24 additions & 0 deletions b/‎+mve/license.txt
Lines changed: 24 additions & 0 deletions
diff --git a/‎.gitignore
Lines changed: 14 additions & 0 deletions b/‎.gitignore
Lines changed: 14 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 142 additions & 0 deletions b/‎README.md
Lines changed: 142 additions & 0 deletions
diff --git a/‎angleDiff.m
Lines changed: 30 additions & 0 deletions b/‎angleDiff.m
Lines changed: 30 additions & 0 deletions
diff --git a/‎clusterVol.m
Lines changed: 42 additions & 0 deletions b/‎clusterVol.m
Lines changed: 42 additions & 0 deletions
@@ -0,0 +1,94 @@
+function [A , c] = MinVolEllipse(P, tolerance)
+% [A , c] = MinVolEllipse(P, tolerance)
+% Finds the minimum volume enclsing ellipsoid (MVEE) of a set of data
+% points stored in matrix P. The following optimization problem is solved: 
+%
+% minimize       log(det(A))
+% subject to     (P_i - c)' * A * (P_i - c) <= 1
+%                
+% in variables A and c, where P_i is the i-th column of the matrix P. 
+% The solver is based on Khachiyan Algorithm, and the final solution 
+% is different from the optimal value by the pre-spesified amount of 'tolerance'.
+%
+% inputs:
+%---------
+% P : (d x N) dimnesional matrix containing N points in R^d.
+% tolerance : error in the solution with respect to the optimal value.
+%
+% outputs:
+%---------
+% A : (d x d) matrix of the ellipse equation in the 'center form': 
+% (x-c)' * A * (x-c) = 1 
+% c : 'd' dimensional vector as the center of the ellipse. 
+% 
+% example:
+% --------
+%      P = rand(5,100);
+%      [A, c] = MinVolEllipse(P, .01)
+%
+%      To reduce the computation time, work with the boundary points only:
+%      
+%      K = convhulln(P');  
+%      K = unique(K(:));  
+%      Q = P(:,K);
+%      [A, c] = MinVolEllipse(Q, .01)
+%
+%
+% Nima Moshtagh ([email protected])
+% University of Pennsylvania
+%
+% December 2005
+% UPDATE: Jan 2009
+
+
+
+%%%%%%%%%%%%%%%%%%%%% Solving the Dual problem%%%%%%%%%%%%%%%%%%%%%%%%%%%5
+% ---------------------------------
+% data points 
+% -----------------------------------
+[d N] = size(P);
+
+Q = zeros(d+1,N);
+Q(1:d,:) = P(1:d,1:N);
+Q(d+1,:) = ones(1,N);
+
+
+% initializations
+% -----------------------------------
+count = 1;
+err = 1;
+u = (1/N) * ones(N,1);          % 1st iteration
+
+
+% Khachiyan Algorithm
+% -----------------------------------
+while err > tolerance,
+    X = Q * diag(u) * Q';       % X = \sum_i ( u_i * q_i * q_i')  is a (d+1)x(d+1) matrix
+    M = diag(Q' * inv(X) * Q);  % M the diagonal vector of an NxN matrix
+    [maximum j] = max(M);
+    step_size = (maximum - d -1)/((d+1)*(maximum-1));
+    new_u = (1 - step_size)*u ;
+    new_u(j) = new_u(j) + step_size;
+    count = count + 1;
+    err = norm(new_u - u);
+    u = new_u;
+end
+
+
+
+%%%%%%%%%%%%%%%%%%% Computing the Ellipse parameters%%%%%%%%%%%%%%%%%%%%%%
+% Finds the ellipse equation in the 'center form': 
+% (x-c)' * A * (x-c) = 1
+% It computes a dxd matrix 'A' and a d dimensional vector 'c' as the center
+% of the ellipse. 
+
+U = diag(u);
+
+% the A matrix for the ellipse
+% --------------------------------------------
+A = (1/d) * inv(P * U * P' - (P * u)*(P*u)' );
+
+
+% center of the ellipse 
+% --------------------------------------------
+c = P * u;
@@ -0,0 +1,24 @@
+Copyright (c) 2009, Nima Moshtagh
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+    * Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in
+      the documentation and/or other materials provided with the distribution
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,14 @@
+*~
+.~*#
+.nfs*
+*.mat
+*.png
+*.fig
+*.aux
+*.log
+*.blg
+*.out
+*.pdf
+*.gz
+*.ods
+synt/
@@ -0,0 +1,142 @@
+Introduction
+============
+
+The AMVIDC algorithm is presented in detail in the following 
+publication:
+
+"Spectrometric differentiation of yeast strains using Minimum Volume 
+Increase and Minimum Direction Change clustering criteria", N. Fachada, 
+M.T. Figueiredo, V.V. Lopes, R.C. Martins and A.C. Rosa. Pattern 
+Recognition Letters, 2014 (IN PRESS)
+
+Data format
+-----------
+
+Typically, data is presented as a set of samples (or points), each with
+a constant number of dimensions. As such, for the rest of this guide,
+data matrices are considered to be in the following format:
+
+-   *m* x *n*, with *m* samples (points) and *n* dimensions (variables)
+
+Many times the number of dimensions is too high, making clustering
+inefficient. When this occurs, one can reduce data dimensionality using
+a number of techniques. In this work, PCA and SVD (which are very
+similar) are used via the Matlab native `princomp` and `svd`/`svds`
+functions.
+
+Generating data
+---------------
+
+This code was inspired on the differentiation of spectrometric data. 
+However, to further validate the clustering algorithms, synthetic
+data sets can be generated with the generateData function. This function 
+generates data in the *m* x *n* format, with *m* samples (points) and 
+*n* dimensions (variables) according to a set of parameters, which are 
+explained in the source code.
+
+Running the algorithm
+=====================
+
+This algorithm is based on AHC, using Minimum Volume Increase (MVI) and 
+Minimum Direction Change (MDC) clustering criteria. This algorithm can 
+be tested using the clusterdata_amvidc.m function:
+
+    idx = clusterdata_amvidc(X, k, idx_init);
+
+where **X**, **k** and **idx\_init** are the typical data matrix,
+maximum number of clusters and initial clustering, respectively. Initial
+clustering is required so that all possible new clusters have volume, a
+requirement for MVI. `clusterdata_amvidc` function has many optional
+parameters, with reasonable defaults, as specified in the following
+table:
+
+  Parameter     Default                  Options/Description
+  ------------- ------------------------ -------------------------------------------------------------------------------------------------------
+  *volume*      ‘convhull’               Volume type: ‘ellipsoid’ or ‘convhull’
+  *tol*         0.01                     Tolerance for minimum volume ellipse calculation (‘ellipsoid’ volume only)
+  *dirweight*   0                        Direction weight in last iteration (0 means MDC linkage is ignored)
+  *dirpower*    *dirweight* \> 0         Convergence power to dirweight (higher values make convergence steeper and occurring more to the end)
+  *dirtype*     ‘svd’                    Direction type: ‘pca’, ‘svd’
+  *nvi*         true                     Allow negative volume increase?
+  *loglevel*    3 (show warnings only)   Log level: 0 (show all messages) to 4 (only show critical errors), default is 3 (show warnings)
+
+For example, to perform clustering using ellipsoid volume taking into
+account direction change, where cluster direction is determined using
+PCA, one would do:
+
+    idx = clusterdata_mvidc(X, k, idx_init, 'volume', 'ellipsoid', 'dirweight',0.5, 'dirpower', 4, 'dirtype', 'pca');
+
+As specified, the `clusterdata_amvidc` function requires initial clusters
+which, if joined, produce new clusters with volume. There are two
+clustering functions appropriate for this (but others can be used):
+
+-   initClust.m - Performs very simple initial clustering based
+    on AHC with single linkage (nearest neighbor) and user defined
+    distance. Each sample is associated with the same cluster of its
+    nearest point. Allows to define a minimum size for each cluster,
+    distance type (as supported by Matlab `pdist`) and the number of
+    clusters which are allowed to have less than the minimum size.
+-   pddp.m - Perform PDDP (principal direction divisive
+    clustering) on input data. This implementation always selects the
+    largest cluster for division, with the algorithm proceeding while
+    the division of a cluster yields sub-clusters which can have a
+    volume.
+
+Analysis of results
+===================
+
+F-score
+-------
+
+In this work, the [F-score](http://en.wikipedia.org/wiki/F1_score)
+measure was used to evaluate clustering results. The source:fscore.m
+function was developed for this purpose. To run this function, do:
+
+    eval = fscore(idx, numclasses, numclassmembers);
+
+where:
+
+-   **idx** - *m* x *1* vector containing the cluster indices of each
+    point (as returned by the clustering functions)
+-   **numclasses** - Correct number of clusters
+-   **numclassmembers** - Vector with the correct size of each cluster
+    (or a scalar if all clusters are of the same size)
+
+The `fscore` function returns:
+
+-   **eval** - Value between 0 (worst case) and 1 (perfect clustering)
+
+Plotting clusters
+-----------------
+
+Sometimes visualizing how an algorithm grouped clusters can provide
+important insight on its effectiveness. Also, it may be important to
+visually compare an algorithm’s clustering result with the correct
+result. These are the goals of the plotClusters.m function, which
+can show two clustering results in the same image (e.g. the correct one
+and one returned by an algorithm). You can run `plotClusters` in the
+following way:
+
+    h_out = plotClusters(X, dims, idx_marker, idx_encircle, encircle_method, h_in);
+
+where:
+
+-   **X** - Data matrix, *m* x *n*, with m samples (points) and n
+    dimensions (variables)
+-   **dims** - Number of dimensions (2 or 3)
+-   **idx_marker** - Clustering result ^1^ to be shown directly in
+    points using markers
+-   **idx_encircle** - Clustering result ^1^ to be shown using
+    encirclement/grouping of points
+-   **encircle_method** - How to encircle the **idx*encircle**
+    result: ‘convhull’ (default), ‘ellipsoid’ or ‘none’
+-   **h_in** - (Optional) Existing figure handle where to create
+    plot
+
+^1^ *m* x *1* vector containing the cluster indices of each point
+
+The `plotClusters` function returns:
+
+-   **h_out** - Figure handle of plot
+
+
@@ -0,0 +1,30 @@
+%
+% angleDiff function - determine smallest angle between two directions
+%
+% Parameters:
+%       v1 - n x 1, vector representing first direction
+%       v2 - n x 1, vector representing second direction
+% Output:
+%    delta - angle in radians between first and second directions
+%
+function delta = angleDiff(v1, v2)
+
+% Determine number of dimensions
+numDims = size(v1, 1);
+
+% Make sure vectors are in 1st or 2nd quadrants (from a 2D perspective,
+% although this should work for m-dimensions)
+if  v1(numDims, 1) < 0
+    v1 = -1 * v1;
+end;
+if  v2(numDims, 1) < 0
+    v2 = -1 * v2;
+end;
+
+% Obtain angle between vectors (thus, between the directions they
+% represent)
+cosDelta = dot(v1, v2) / (norm(v1) * norm(v2));
+delta = acos(cosDelta);
+%delta = 1 - cosDelta;
+
+end
@@ -0,0 +1,42 @@
+% 
+% clusterVol function - Determine the volume of the cluster formed by the
+% given set of points. The volume is determined using the convex hull
+% formed by the cluster, of the minimum volume ellipsoid (mve) formed by 
+% the cluster.
+% 
+% Parameters:
+%       points - m x n, with m samples and n dimensions
+%         type - 'ellipsoid' or 'convhull'
+% zeroVolValue - value to assign to volume if given points are not enough
+%                to calculate a volume
+%          tol - tolerance for ellipsoid volume
+% Output:
+%  volCluster - Volume of the cluster formed by the given set of points.
+%
+function volCluster = clusterVol(points, type, zeroVolValue, tol)
+
+% How many points are in cluster
+sizeCluster = size(points, 1);
+% How many dimensions are at stake
+numDims = size(points, 2);
+% Add path to external functions
+%addpath('external/');
+
+% Check if there are enough points to calculate a volume
+if sizeCluster < numDims + 1
+    volCluster = zeroVolValue;
+    return;
+end;
+
+% Determine what type of volume to use
+if strcmp(type, 'ellipsoid')
+    % Ellipsoid, use MinVolEllipse from Nima Moshtagh
+    [A, ~] = mve.MinVolEllipse(points', tol);
+    volCluster = det(inv(A));
+elseif strcmp(type, 'convhull')
+    % Convex hull, use matlab native function
+    [~, volCluster] = convhulln(points,{'QJ','Pp'});
+else
+    error('Unknown type of volume.');
+end;
+
-Original file line number
+Diff line change
@@ @@ -0,0 +1,14 @@ @@
 +*~
 +.~*#
 +.nfs*
 +*.mat
 +*.png
 +*.fig
 +*.aux
 +*.log
 +*.blg
 +*.out
 +*.pdf
 +*.gz
 +*.ods
 +synt/