-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathcompute_ssp_e2_mmap.m
125 lines (120 loc) · 5.64 KB
/
compute_ssp_e2_mmap.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
function [M, B, W] = compute_ssp_e2_mmap(map, metadata, bindex, bytes, pts)
%COMPUTE_SSP_E2_MMAP Computes sums of squares and cross products
% [M, B, W] = COMPUTE_SSP_E2_MMAP(map, metadata, bindex, bytes, pts)
% computes the sums of squares and cross products matrices from memory
% mapped data, allowing to work with large data sets.
%
% The sums of squares and cross product matrices are those matrices used
% to compute the multivariate analysis of variance (MANOVA). They
% represent the "treatment" matrix (B) and the "residual" or "error"
% matrix (W). W can be seen as a scaled pooled covariance matrix, as it
% is in fact the sum of the covariances from each data group.
%
% Given B and W it is possible to compute the total sum of squares and
% products corrected for the mean as:
% B+W = sum_l*sum_j[(x_lj - x_bar)(x_lj-x_bar)']
% where x_lj is the sample value of variable j (1<=j<=nr_points) for
% group l (1<=l<=nr_groups), and x_bar is the overall mean vector.
% The number of degrees of freedom of (B+W) is
% nr_groups*nr_samples_per_groups - 1. See below for more details.
%
% With B, W and B+W it is then possible to compute Wilks' Lambda:
% L = |W| / |B+W|
% in order to test if the hypothesis H0 (there is no statistical
% difference between the groups/treatments) can be rejected or not.
% See the "Statistical Multivariate Analysis" book, p.302.
%
% This method is targeted to the E2-related experiments, where a series
% of bytes are loaded to some registers, and generally the value of a
% particular byte is being analysed while the values of the other bytes
% are either fixed or random.
%
% map should be a memmapfile object with 1 data entry containing at least:
% - transposed data matrix X of size nr_points x nr_trials
% - transposed input bytes matrix B of size nr_bytes x nr_trials
%
% metadata should provide in addition at least the number of groups
% (nr_groups) for which different data was acquired. The assumption here
% is that the data in the matrix X is from random input byte values, as
% specified by the matrix B, and that the number of different byte values
% is given by nr_groups.
%
% bindex should be a vector of indeces used to select which block samples
% will be used to compute the covariance matrix. The data in X is assumed to
% be taken over nr_trials trials, where each consecutive nr_groups trials
% form a block. In each block all the possible values of the target byte
% should be covered. Therefore, when selecting samples corresponding to a
% particular byte value, this method will select one sample (trial) from
% each block, for a total of nr_blocks samples. The bindex vector should
% provide a filter to select only the specified samples, allowing to test
% different selection sizes. The values in bindex should be between 1 and
% nr_blocks inclusive. nr_blocks can be computed as nr_trials/nr_groups
% from the metadata information.
%
% bytes should be a vector of byte values that will select which groups
% to consider in order to compute the covariance matrix. Each index in
% the bytes vector should be between 0 and (nr_groups-1).
% Note: group 1 corresponds to byte value 0, group 2 to byte value
% 1, ..., group 256 to byte value 255.
%
% pts should be a vector of indeces that specifies which points from each
% trace will be used to create SSP matrices. If not given or empty then
% the default from metadata (1:metadata.nr_points) will be used.
%
% The outputs of this method are:
% - M: a matrix of size nr_groups x nr_points, having the mean vector
% for each group specified by the "bytes" parameter.
% - B: the "treatment" matrix, containing the variance caused only by the
% mean vectors of each group. The number of degrees of freedom of B is
% df_B = nr_groups - 1
% where nr_groups is given by the first dimension of M.
% - W: the "residual" matrix, containing the pooled (sumed) covariance
% of the data within each group. The number of degrees of freedom of W is
% df_W = nr_groups x (nr_samples_per_group - 1)
% where nr_samples_per_group is given by the length of the supplied
% parameter "bindex".
% Both B and W are matrices of size nr_points x nr_points.
%
% Note: W can be used as a "pooled" covariance matrix, i.e. a better
% estimate of a covariance matrix, when the covariance of each group is
% similar. However, in this case, W should be divided by the number of
% degrees of freedom df_W = nr_groups x (nr_samples_per_group - 1).
%
% See also compute_features_e2_mmap, get_mmap.
%% Check and initialize data
nr_groups = metadata.nr_groups;
nr_bytes = length(bytes);
if min(bytes) < 0 || max(bytes) > (nr_groups-1)
error('Some index in bytes is outside available groups');
end
if nargin < 5 || isempty(pts)
nr_points = metadata.nr_points;
pts = 1:nr_points;
else
pts = pts(:);
nr_points = length(pts);
end
M = zeros(nr_groups, nr_points);
B = zeros(nr_points, nr_points);
W = zeros(nr_points, nr_points);
nr_samples_per_group = length(bindex);
%% Compute the group means and the residual matrix W
fprintf('Computing the mean vectors M and the residual matrix W...\n');
for k=1:nr_bytes
kindex = find(map.data(1).B(2,:)==bytes(k));
lindex = kindex(bindex);
% Below note transpose and conversion in case we had integer class
L = double(map.data(1).X(pts,lindex)');
M(k,:) = mean(L, 1);
X = L - ones(nr_samples_per_group, 1)*M(k,:);
W = W + X'*X;
end
mvec = mean(M, 1);
%% Compute the treatment matrix B
fprintf('Computing the treatment matrix B...\n');
for k=1:nr_bytes
xm = M(k,:) - mvec;
B = B + xm'*xm;
end
B = B * nr_samples_per_group;
end