-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathplay.tex
677 lines (523 loc) · 39.1 KB
/
play.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
\clearpage
This is a tutorial document that may one day grow into a paper.
The project behind this tutorial is to use relational algebra techniques to optimize linear algebra programs, such as those in machine learning.
The goal is to achieve performance speedups by finding better plans through relational techniques.
\section{Motivation}
\begin{figure}[tbh]
\includegraphics[trim={3cm 2cm 2cm 0},clip,width=1\linewidth]{la_to_ra_and_back}
\caption{For a given query MPlan, it is difficult to find the lowest-cost semantically equivalent MPlan due an incomplete basis: MPlan operators are compound and can be decomposed into more primitive parts. Rewriting to a canonical RPlan grants the ability to find otherwise unreachable equivalent MPlans.}
\label{fOverallPlan}
\end{figure}
\begin{figure}[tbh]
\includegraphics[trim={0 0 0 0},clip,width=1\linewidth]{la_to_ra_and_back_flowchart}
\caption{Approach to finding the lowest-cost semantically equivalent MPlan by searching through a space generated by canonical RPlans.}
\label{fOverallPlanFlowchart}
\end{figure}
\section{$ABC+BCD+ABD$}
\begin{figure*}[tbh]
\subcaptionbox{Cost in flops for a few plans computing $ABC+BCD+ABD$\label{tExCost}}{
\begin{tabular}{c|ccc}
plan & cost (dense)
& \begin{tabular}{@{}c@{}}$n_1=2$\\$n_2=10$\end{tabular}
& \begin{tabular}{@{}c@{}}$n_1=10$\\$n_2=2$\end{tabular} \\
\hline
$AB(C+D) + (BC)D$ & $2n_1^2 n_2 + 6n_1 n_2^2 + n_2^2 + n_1 n_2$ & 1400 & \textbf{664} \\
$(BC+AB)D + (AB)C$ & $2n_1^2 n_2 + 6n_1 n_2^2 + 2n_1 n_2$ & \textbf{1320} & 680 \\
$(AB)C + (BC)D + (AB)D$ & $2n_1^2 n_2 + 8n_1 n_2^2 + 3n_1 n_2$ & 1740 & 780 \\
$A(BC) + (BC)D + (AB)D$ & $4n_1^2 n_2 + 6n_1 n_2^2 + 3n_1 n_2$ & 1420 & 1100 \\
\end{tabular}
}
\subcaptionbox{Plot of $n_2:n_1$ ratio vs. plan cost\label{fExPlot}}{
\includegraphics[width=0.32\linewidth]{ex_compare}
}
\end{figure*}
Consider the following DML program, in which $n_1$ and $n_2$ are parameters that determine the dimensions of the matrices $A$, $B$, $C$, $D$.
The \texttt{matrix} function creates dense matrices with the given dimensions, filled with the value 1.
\begin{lstlisting}[language=R,escapeinside={(*}{*)},frame=single,caption={DML Program for $ABC+BCD+ABD$},label={lExDML}]
A = matrix(1, rows=(*$n_1$*), cols=(*$n_1$*))
B = matrix(1, rows=(*$n_1$*), cols=(*$n_2$*))
C = matrix(1, rows=(*$n_2$*), cols=(*$n_2$*))
D = matrix(1, rows=(*$n_2$*), cols=(*$n_2$*))
R = A%*%B%*%C + B%*%C%*%D + A%*%B%*%D
write(R, "result.csv")
\end{lstlisting}
\subsection{Manual analysis of plans}
What is the optimal plan to compute $R$?
The answer depends on $n_1$ and $n_2$.
Table~\ref{tExCost} presents a few alternative plans that compute $R$.
The cost, in number of additions and multiplications, is given symbolically in terms of $n_1$ and $n_2$, as well as numerically with different ratios of $n_1$ and $n_2$.
Let's manually analyze the cost of a few alternative plans that compute $R$ for practice.
Recall that a dense matrix multiplication of an $a \times b$ matrix with an $b \times c$ matrix requires $ac$ inner products, each requiring $b$ multiplications and $b-1$ additions (really, $b$ additions when added to a 0 in the result matrix), for a total cost of $2abc$.
Table~\ref{tExCost} hints that the optimal plan for $R$ depends on the ratio of $n_2$ to $n_1$.
When $n_1 \ll n_2$, then the first plan appears to be cheapest.
When $n_1 \gg n_2$, then the second plan appears to be cheapest.
The third and fourth plans, while correct, do not appear to be cheapest for any ratio, as they do not take advantage of + factorizations.
Figure~\ref{fExPlot} on the ratio $r$ confirms this trend.
Note that the author's analysis missed a lower cost plan for inputs where $n_1 \gg n_2$. This plan is $AB(C+D) + B(CD)$. It is easy to miss such plans during manual analysis.
A real ML program contains more complicated expressions. Manual analysis is not feasible; an automated method for finding a (near-)optimal plan is called for. Let's walk through how we could optimize this expression in a more general fashion.
\subsection{MPlans}
Different ways of representing plans can make it easier to optimize them.
One of the simplest representations is the Directed Acyclic Graph (DAG).
A DAG representation presents inputs to an expression as leaves at the bottom, outputs as roots at the top, and operators as interior nodes.
Parsers parse DML expressions into DAGs.
In our example, a parser would parse the program from Listing~\ref{lExDML} into a DAG consisting of matrix operators, shown in Figure~\ref{fExMatrixDAG}.
This parsing assumes parentheses on the left of every multiplication and addition, as in $((AB)C + (BC)D) + (AB)D$.
The order of children is significant in this representation, which we call an MPlan.
The DAG shows the common subexpression $AB$ with two parents: the node computing $ABC$ and the node computing $ABD$.
Common subexpressions are important to recognize, because they can be computed once and reused for all parents.
The process of identifying common subexpressions inside a DAG is called \textit{common subexpression elimination} (i.e., eliminating duplicate subplans by reconnecting the parents of the duplicates to a single copy of the subplan).
It is possible to optimize MPlans directly, by writing rules over the operators MxM (matrix multiplication), $*$ (element-wise multiplication), $+$ (element-wise addition), $^\tr$ (transpose), $\sum_{row}$ (row-wise aggregation), $\sum_{col}$ (column-wise aggregation), and $\sum$ (full aggregation).
However, reasoning at the matrix DAG level is prone to missing possible rewrites,
such as interchanging the $\sum$ over an MxM or $*$.
We therefore consider a finer-grained representation.
\begin{figure}[tbh]
\centering
\begin{tikzpicture}[level distance=3em, node distance=3em,
level 1/.style={sibling distance=8em},
level 2/.style={sibling distance=8em},
level 3/.style={sibling distance=4em},
]
\node {$+$}
child {node {$+$}
child {node (ABC) {MxM}
child {node (AB) {MxM}
% child {node (Aab) {A}} %{$B_{ae}$}
% child {node (Bbc) {B}} %{$C_{ef}$}
}
child[missing]% {node (Ccd) {C}} %{$D_{fd}$}
}
child {node (BCD) {MxM}
child {node (BC) {MxM}
% child {node (Bae) {B}} %{$B_{ae}$}
% child {node (Cef) {C}} %{$C_{ef}$}
}
child[missing]% {node (Dfd) {D}} %{$D_{fd}$}
}
}
child {node (ABD) {MxM}
child[missing]
% {node {MxM}
% child {node (Aag) {A}} %{$A_{ag}$}
% child {node (Bgh) {B}} %{$B_{gh}$}
% }
child[missing] %{node (Dhd) {D}} %{$D_{hd}$}
}
;
\node (A) [below=of AB,xshift=2em] {$A$};
\node (B) [right=of A] {$B$};
\node (C) [right=of B] {$C$};
\node (D) [right=of C] {$D$};
\draw (A.north) -- (AB);
% \draw (A.north) -- (Aag.south);
\draw (B.north) -- (AB);
\draw (B.north) -- (BC);
% \draw (B.north) -- (Bgh.south);
\draw (AB) -- (ABD);
\draw (C.north) -- (ABC);
\draw (C.north) -- (BC);
\draw (D.north) -- (BCD);
\draw (D.north) -- (ABD);
\end{tikzpicture}
\caption{MPlan, parsed from Listing~\ref{lExDML}}
\label{fExMatrixDAG}
\end{figure}
As a brief note, we would like to mention that DML programs contain compound operators in addition to the ones listed above.
For example, $-$ (element-wise subtraction) and \^{} (element-wise power) are DML operators that can be written in terms of more basic ones, by applying the rules
\begin{align*}
A - B &\to A + (-1)*B \\
\forall k \in \textsc{Z}^+,\, A^k &\to A * A^{k-1} \\
A^0 &\to 1
\end{align*}
Other operators in DML, such as min and max, cbind (horizontal concatenation), reshape (permute the entries of a matrix into a different shape), and table (construct a matrix from a vector of row indices, a vector of column indices, and a vector of values) are deferred to future work. Including these operators is important because it may be possible to optimize through them by, for example, pushing down a matrix multiply through a cbind($A$,$B$) operator in order to optimize it with the plans that compute $A$ and $B$. In this work, we simply model these operators as uninterpreted black boxes.
\subsection{RPlan DAG}
The elementary operators of linear algebra are patterns of $+$ and $*$.
In this section we introduce two patterns that have sufficient expressiveness to represent all the matrix operators in the MPlans. We call the new representation RPlans, short for relation plans.
The first pattern is the join.
Element-wise addition or multiplication join their arguments on matching row and column indices, adding or multiplying matching values.
The first stage of matrix multiply joins the column indices of the first matrix to equal row indices of the second matrix, multiplying matching values.
The first stage of the inner product of two vectors joins on their matching indices.
The outer product of two vectors joins without any join conditions, i.e., taking the Cartesian product of indices.
Thus, joins in linear algebra may involve 0, 1, or 2 join conditions of indices.
In the spirit of relational algebra, from which we borrow the join operation, we will refer to indices as \textit{attributes}.
The objects in the RPlan algebra representation can be considered \textit{relations},
whose attribute names are known as its \textit{schema}.
The schema of a join has attributes equal to the union of its inputs' schema.
We use the symbols $*$ and $+$ to denote the $n$-ary join that multiplies or adds matching values.
The two operators are similar, but there is an important difference between them.
If we imagine the value 0 as a ``null value'' in the relational sense,
then $*$ acts like an inner join, because multiplication has the annihilator property that for any number $x$, $0 * x = 0$,
whereas $+$ acts like an outer join, because addition does not have the annihilator property.
Nuances like these are important when writing a cost model.
%k-relation
In order to represent join conditions, we adopt the convention of natural join semantics, i.e., that attributes whose names match are meant to be joined together.
We use two operators to manage attributes:
the \textit{bind} operator to assign attribute names, and the \textit{unbind} operator to un-assign attributes.
We write $\bind{ab}$ to bind the indices $a$ and $b$ to the rows and columns of a matrix.
We write $\unbind{ab}$ to un-assign the attribute $a$ as the row and $b$ as the column in a matrix.
Vectors require only one attribute name, because they have a single dimension.
The semantics of bind are to open a stream of scalars from the input that are labelled with the attribute names they are bound to. Bound names may be thought of as iterating over the input values.
The semantics of unbind are to collect a stream of scalars into a matrix object, using the positions of the unbound attributes to determine which index is the row and which is the column.
The transpose operator is therefore represented in RPlans as an unbind that reverses the order of unbound names.
For example, the plan $\unbind{ba}( \bind{ab}A )$ transposes the input $A$ by unbinding index names opposite the order in which the indices were bound.
Applying this to matrix multiply, the first stage of the plan $A \text{ MxM } B$ can be represented as $\bind{ab} A * \bind{bc} B$.
The name $b$ joins the rows of $A$ to the columns of $B$.
The result is a 3-tensor, i.e. a tensor of 3 dimensions, because it has three attributes in its schema: $a, b, c$.
The final part of a matrix multiply can be represented with aggregation over $b$.
Aggregation is an operator that is annotated with the attributes to be aggregated, in this case $\sum_b$.
The schema of an aggregation is the schema of its input minus the aggregated attributes.
Thus, a complete expression for the plan $A \text{ MxM } B$ is $\unbind{ac} \sum_b( \bind{ab} A * \bind{bc} B )$.
\subsection{RPlan DAG in Canonical Form}
A common way to write the example in math, assuming parentheses on the left of every multiplication, is
\begin{align*}
R_{ad} &=
\sum_c \lrparen{\sum_b A_{ab} B_{bc}} C_{cd} +
\sum_f \lrparen{\sum_e B_{ae} C_{ef}} D_{fd} \\&\qquad\qquad +
\sum_h \lrparen{\sum_g A_{ag} B_{gh}} D_{hd} \\
&= \sum_{bc} A_{ab} B_{bc} C_{cd} + \sum_{ef} B_{ae} C_{ef} D_{fd} + \sum_{gh} A_{ag} B_{gh} D_{hd}
\end{align*}
The above math includes indices (i.e., the letters $a, b, c$, etc.) that bind to the rows and columns of the input matrices, in a way that matches the semantics of matrix multiplication:
the column of the first matrix joins to (has the same attribute name as) the row of the second matrix, which is aggregated away after multiplication.
The rewriting uses the distributive law of $*$ over $\sum$ and the commutativity of $\sum$ to pull the aggregation operators to the top of the expression and push the multiplication operators to the bottom.
% Talk about relational interpretation over k-relations?
``Top'' and ``bottom'' refer to the presentation of the expression in the form of a DAG.
The DAG has one root corresponding to the output of the expression, and one leaf for each input.
Internal nodes take the form of $+$, $*$, and $\sum$.
Figure \ref{fExDag} presents a DAG of the expressions both before and after rewriting into canonical form.
In general, the canonical form of an RPlan is one that, from bottom to top, includes the inputs, a bind layer, a multiplication layer, an aggregation layer, an addition layer, and an unbind layer.
\begin{figure}[tbh]
\centering
\subcaptionbox{Plan to compute $((AB)C+(BC)D)+(AB)D$\label{fExDagOrig}}{
\begin{tikzpicture}[level distance=3em, node distance=3em,
level 2/.style={sibling distance=12em},
level 3/.style={sibling distance=5em},
level 4/.style={sibling distance=3em},
]
\node {\unbind{ae}}
child {node {$+$}
child {node {$+$}
child {node {$\sum_{c}$} child {node {$*$}
child {node {$\sum_b$} child {node {$*$}
child {node (Aab) {\bind{ab}}} %{$B_{ae}$}
child {node (Bbc) {\bind{bc}}} %{$C_{ef}$}
}}
child {node (Ccd) {\bind{cd}}} %{$D_{fd}$}
}}
child {node {$\sum_{f}$} child {node {$*$}
child {node {$\sum_e$} child {node {$*$}
child {node (Bae) {\bind{ae}}} %{$B_{ae}$}
child {node (Cef) {\bind{ef}}} %{$C_{ef}$}
}}
child {node (Dfd) {\bind{fd}}} %{$D_{fd}$}
}}
}
child {node {$\sum_{h}$} child {node {$*$}
child {node {$\sum_g$} child {node {$*$}
child {node (Aag) {\bind{ag}}} %{$A_{ag}$}
child {node (Bgh) {\bind{gh}}} %{$B_{gh}$}
}}
child {node (Dhd) {\bind{hd}}} %{$D_{hd}$}
}}
};
\node (A) [below=of Bbc,xshift=2em] {$A$};
\node (B) [right=of A] {$B$};
\node (C) [right=of B] {$C$};
\node (D) [right=of C] {$D$};
\draw (A.north) -- (Aab.south);
\draw (A.north) -- (Aag.south);
\draw (B.north) -- (Bbc.south);
\draw (B.north) -- (Bae.south);
\draw (B.north) -- (Bgh.south);
\draw (C.north) -- (Ccd.south);
\draw (C.north) -- (Cef.south);
\draw (D.north) -- (Dfd.south);
\draw (D.north) -- (Dhd.south);
\end{tikzpicture}
}
\subcaptionbox{Plan rewritten into canonical form\label{fExDagCanonical}}{
\begin{tikzpicture}[level distance=3em, node distance=3em,
level 2/.style={sibling distance=7em},
level 3/.style={sibling distance=3.5em},
level 4/.style={sibling distance=2.2em},
]
\centering
\node {\unbind{ae}}
child {node {$+$}
child {node {$\sum_{bc}$}
child {node {$*$}
child {node (Aab) {$\bind{ab}$}} %{$A_{ab}$}
child {node (Bbc) {\bind{bc}}} %{$B_{bc}$}
child {node (Ccd) {\bind{cd}}} %{$C_{cd}$}
}
}
child {node {$\sum_{ef}$}
child {node {$*$}
child {node (Bae) {\bind{ae}}} %{$B_{ae}$}
child {node (Cef) {\bind{ef}}} %{$C_{ef}$}
child {node (Dfd) {\bind{fd}}} %{$D_{fd}$}
}
}
child {node {$\sum_{gh}$}
child {node {$*$}
child {node (Aag) {\bind{ag}}} %{$A_{ag}$}
child {node (Bgh) {\bind{gh}}} %{$B_{gh}$}
child {node (Dhd) {\bind{hd}}} %{$D_{hd}$}
}
}
};
\node (A) [below=of Bbc] {$A$};
\node (B) [below=of Bae,xshift=-.33em] {$B$};
\node (C) [below=of Dfd,xshift=.33em] {$C$};
\node (D) [below=of Bgh] {$D$};
\draw (A.north) -- (Aab.south);
\draw (A.north) -- (Aag.south);
\draw (B.north) -- (Bbc.south);
\draw (B.north) -- (Bae.south);
\draw (B.north) -- (Bgh.south);
\draw (C.north) -- (Ccd.south);
\draw (C.north) -- (Cef.south);
\draw (D.north) -- (Dfd.south);
\draw (D.north) -- (Dhd.south);
\end{tikzpicture}
}
\caption{DAG representation of plans to compute $ABC+BCD+ABD$}
\label{fExDag}
\end{figure}
\subsection{Mockup bottom-up plan search}
\section{Formal Definitions}
\subsection{MPlans, RPlans, Mixed Plans}
\begin{table}[tbh]
\begin{tabular}{llll}
&name & type & syntax \\
\hline
\multirow{7}{*}{\rotatebox[origin=c]{90}{MPlan}}
&mmult & $M_{M,L} \times M_{L,N} \rightarrow M_{M,N}$ & $AB$ or MxM \\
&elemmult & $M_{M,N} \times M_{M,N} \rightarrow M_{M,N}$ & $A*B$ \\
&elemplus & $M_{M,N} \times M_{M,N} \rightarrow M_{M,N}$ & $A+B$ \\
&rowagg & $M_{M,N} \rightarrow M_{M,1}$ & $\sum_{row} A$ \\
&colagg & $M_{M,N} \rightarrow M_{1,N}$ & $\sum_{col} A$ \\
&agg & $M_{M,N} \rightarrow M_{1,1}$ & $\sum A$ \\
%&power & $M \times Int \rightarrow M$ & $A^k$ \\
&transpose & $M_{M,N} \rightarrow M_{N,M}$ & $A^\tr$ \\
\hline
\multirow{2}{*}{\rotatebox[origin=c]{90}{conv.}}
&bind & $M_{M,N} \times [i,j] \rightarrow R_{i:M,j:N}$ & $\bind{i,j} A$ \\
&unbind & $R_{i:M,j:N} \times [i,j] \rightarrow M_{M,N}$ & $\unbind{i,j} A$ \\
\hline
\multirow{2}{*}{\rotatebox[origin=c]{90}{RPlan}}
&join & $R_{S_1} \times R_{S_2} \rightarrow R_{S_1 \cup S_2}$ & $A*B$ or $A+B$ \\
&agg & $R_S \times U \rightarrow R_{S \setminus U}$ & $\sum_{U} A$ \\
\end{tabular}
\caption{Operators of MPlans and RPlans. Uninterpreted black-box functions are also allowed.
The type $M_{M,N}$ is a matrix of size $M \times N$; $[i,j]$ is a list of attribute names; $R_{i:M,j:N}$ is a relation with attributes $i$ of size $M$ and $j$ of size $N$; $S_1, S_2, S,$ and $U$ are sets of attributes.}
\label{tPlanOps}
\end{table}
The Matrix Plan Algebra is an algebra over real number matrices with the operators listed in the top third of Table~\ref{tPlanOps}.
The type of an matrix with dimensions $M \times N$ is written $M_{M,N}$.
Vectors have orientation and are written as either $M_{1,N}$ or $M_{M,1}$; scalars have type $M_{1,1}$.
An \textit{MPlan}---short for Matrix Plan---is an expression in the Matrix Plan Algebra.
The Relation Plan Algebra is an algebra over relations (whose attributes' domain is a prefix of the positive integers) with the operators listed in the bottom third of Table~\ref{tPlanOps}.
The domain of each attribute is a prefix of the positive integers; we say an attribute has size $s$ if its domain is the integers $1, 2, \dots, s$.
The notation $R_{i:M,j:N,k:L}$ indicates a relation $R$ with attributes $i$ of size $M$, $j$ of size $N$, and $k$ of size $L$.
Each tuple of a relation is annotated with a real number.
The set of a relation's attributes is called its schema.
We write $i \in A$ if $i$ is an attribute name present in $A$'s schema.
An \textit{RPlan}---short for Relation Plan---is an expression in the Relation Plan Algebra.
The middle third of Table~\ref{tPlanOps} presents the bind and unbind conversion operators between matrices and relations. Both operators are annotated with a pair of attribute names, either of which may be the `\_' symbol to match the dimension of the input type. % whose length matches the input's dimension
\begin{definition}
An \textbf{\emph{MPlan}} is an expression in the Matrix Plan Algebra, except that relation objects (but not relation operators) and conversion operators are also allowed. An MPlan is \emph{pure} if it does not contain conversion operators.
An \textbf{\emph{RPlan}} is an expression in the Relation Plan Algebra, except that matrix objects (but not matrix operators) and conversion operators are also allowed. An RPlan is \emph{pure} if it does not contain conversion operators.
A \textbf{\emph{Mixed Plan}} is an expression in the product of the Matrix Plan Algebra and the Relation Plan Algebra, extended with the conversion operators.
\end{definition}
MPlans, RPlans, and Mixed Plans are depicted as DAGs, to emphasize that they may contain common sub-expressions (CSEs) or multiple outputs (multiple roots).
They may also contain uninterpreted (black-box) functions.
\subsection{Converting MPlans to RPlans}
An MPlan may be converted into a semantically equivalent RPlan via the rewrite rules $R_{MR}$ in Figure~\ref{RMR}.
This list includes additional rules for a couple common operators in the DML language: element-wise subtraction and positive integer exponentiation.
\begin{figure}[htb]
\begin{enumerate}
\item $A - B \rightarrow A + (-1) * B$
\item For small integers $k > 1$, $A^k \rightarrow A * A^{k-1}$ and $A^1 \to A$.
\item $A*B \rightarrow \unbind{i,j}(\bind{i,j}A * \bind{i,j}B)$. Same pattern for $+$.
\item $AB \rightarrow \unbind{i,k} \sum_j (\bind{i,j}A * \bind{j,k}B)$.
\item $A^\tr \rightarrow \unbind{j,i} \bind{i,j}A$.
\item $\sum_{row} A \rightarrow \unbind{i,\_} \sum_j \bind{i,j} A$. Similar pattern for $\sum_{col}$, $\sum$.
\end{enumerate}
\caption{MPlan-to-RPlan Ruleset $R_{MR}$}
\label{RMR}
\end{figure}
Applying the $R_{MR}$ rules replaces matrix operators with relation operators, with conversion operators in between to convert matrices to relations and back. The result is an RPlan.
It follows from the $R_{MR}$ rules that the space of MPlans is a subset of the space of RPlans. We prove that it is a strict subset in the following:
\begin{lemma}
Let \MPLAN{} and \RPLAN{} be the set of all MPlans and RPlans.
Then $R_{MR}(\MPLAN{}) \subsetneq \RPLAN{}$.
\end{lemma}
% No confluence needed (confluence means that no matter what order we apply RMR rules, the result is the same).
\begin{proof}
First we need to show that the ruleset $R_{MR}$ is terminating on any MPlan, or else the lemma is improper. It is terminating because each rule (except (1)) removes a matrix operator, and an MPlan has finitely many matrix operators. For rule (2), we use the fact that $k$ decreases and is bounded below to prove termination.
Suppose $p \in \MPLAN{}$.
We prove that $R_{MR}(p) \in \RPLAN{}$ by structural induction on $p$.
The base case, that $p$ is a lone matrix, is trivially an RPlan.
The rules $R_{MR}$ compose the inductive case; for each matrix operator, there is a rule in $R_{MR}$ that converts it to a composition of relation operators and conversion operators. Rule (2) requires a second induction, but otherwise the proof is straightforward.
To show that $R_{MR}(\MPLAN{}) \neq \RPLAN{}$, consider the RPlan $\bind{i,j}A * \bind{k,l}B$.
This RPlan computes a tensor product and has schema $\{i,j,k,l\}$. As its dimension is 4, no representation as an MPlan is possible because the maximum dimension of an MPlan is 2.
\end{proof}
The unbind and bind operators can be eliminated or moved around to create ``pipelines'' of RPlan operators.
For example, the rule $\bind{i,j} \unbind{i,j} A \to A$ removes a bind-unbind sequence, assuming that care is taken to properly handle CSEs.
More generally the rule $\bind{x,y} \unbind{i,j} A \to A[i \to x, j \to y]$ indicates that the attributes $i$ and $j$ in $A$'s schema should be renamed to $x$ and $y$, by propagating the rename downward into $A$.
%This pattern will be discussed in more detail.
The result of eliminating/shifting unbind and bind operators is an RPlan with a bind operator above every (nonscalar) input and an unbind operator below every (nonscalar) output.
\subsection{RPlan Canonical Form}
We now introduce a canonical form for RPlans.
The purpose of the form is to (1) identify common sub-expressions (i.e., expressions that may differ in syntax but compute the same function) and (2) act as a neutral starting point from which we may systematically enumerate alternative plans. Fulfilling these roles requires the canonical property, that two RPlans have the same canonical form if and only if they are semantically equivalent (i.e., they compute the same result on every input).%\footnote{Our canonical form technically is only canonical up to uninterpreted constants. Constant folding, such as replacing $2 + 3$ with 5, occurs before and after our optimization process and affects the estimated cost of plans little.}
Our form is similar in spirit to the disjunctive normal form (DNF) in propositional logic, also known as a truth table. DNF formulae are rarely the most efficient form to execute, but they do act as a canonical form and also serve as a starting point for finding more efficient forms. %((Reference K-V maps?))
Our form is also similar to the canonical form of polynomials as a sum of monomials, except that the monomial terms also include aggregations and constants are treated as black-box symbols, rather than real numbers.
The analogy to polynomials extends further: the canonical form of polynomials combines ``like terms,'' i.e., two terms that have the same variables. Instead of writing $3x^2y + 5x^2y$, for example, we write its canonical form as $8x^2y$.
For RPlans, we determine whether two terms are like by whether there exists an isomorphism mapping one to the other by renaming attributes (up to a constant scalar).
The canonical form for MPlans is the sum-of-aggregations-of-products with like terms combined, as follows:
\begin{definition}
An RPlan is \textbf{canonical} if it can be written as the sum of $n$ terms, each of which is an aggregation over a (possibly empty) attribute set $A_i$ (for $1 \leq i \leq n$), each of whose inputs is the multiplication of $m_i$ base input matrices $b_{ij}$ (for $1 \leq i \leq n$, $1 \leq j \leq m_i$):
\[\sum_{A_1} \lrparen{b_{11} * \dots * b_{1m_1}} %+ \sum_{A_2} (b_{21} * \dots * b_{2m_2})
+ \dots + \sum_{A_n} \lrparen{b_{n1} * \dots * b_{nm_n}}\]
Further, there must not exist, for any $1 \leq i < j \leq n$, an isomorphism $\phi$ renaming attributes in the schema of $b_{i1} * \dots * b_{im_i}$ to attributes in the schema of $b_{j1} * \dots * b_{jm_j}$ such that $m_i=m_j$, $\phi(A_i) = A_j$, and $\phi\{b_{i1} * \dots * b_{im_i}\} = \{b_{j1} * \dots * b_{jm_j}\}$.
The order of summands and multiplicands is insignificant, due to the commutative and associative properties of $*$ and $+$. A canonical order within these can be established when necessary by, e.g., hashing.
% A canonical form may be abbreviated as \\
% $[(A_1; b_{11}, \dots, b_{1m_1}), \dots, (A_n; b_{n1}, \dots, b_{nm_n})]$.
\end{definition}
Figure~\ref{RRC} shows the rewrite rules $R_{RC}$ that canonize RPlans by pulling additions to the top of an expression, followed by aggregations and last multiplications.
Let $\mathcal{C}$ be the transitive application of ruleset $R_{RC}$.
\begin{figure}[htb]
\begin{enumerate}
\item\label{RRC_mp} $A * (B + C) \rightarrow A*B + A*C$
\item\label{RRC_ap} $\sum_i (A + B) \rightarrow \sum_i A + \sum_i B$
\item\label{RRC_ma} If $i \not\in A$, $A * \sum_i B \rightarrow \sum_i (A * B) \quad$ (else rename indices)
\item\label{RRC_aa} $\sum_i \sum_j A \rightarrow \sum_{i,j} A$
% \item $\sum_i A \rightarrow \sum_{i,j} A$
\item\label{RRC_ac} If $i \not\in A$, then $\sum_i A \rightarrow A * |i|$
\item\label{RRC_pp} $A + (B + C) \rightarrow +(A, B, C) \quad$ (binary to n-ary addition)
\item\label{RRC_mm} $A * (B * C) \rightarrow *(A, B, C) \quad$ (binary to n-ary multiply)
\item\label{RRC_c} Constant fold constant terms $c_1 + c_2$, $c_1*c_2$
\item\label{RRC_cc}
If $c_1, c_2$ are constant and $T_1, T_2$ are isomorphic up to renaming,
then $c_1 * T_1 + c_2 * T_2 \to (c_1 + c_2) * T_1$ and apply rule (\ref{RRC_c})
\end{enumerate}
\caption{RPlan-to-Canonical-RPlan Ruleset $R_{RC}$}
\label{RRC}
\end{figure}
We now prove a few properties of $\mathcal{C}$ in order to show that it truly is canonizing.
% Each rewrite preserves the expression's semantics, as they stem from the mathematics of semirings and aggregates.
\begin{lemma}
$\forall p \in \RPLAN{},\, \mathcal{C}(p)$ is in canonical form.
\end{lemma}
\begin{proof}
$R_{RC}$ rules 8 and 9 combine isomorphic-up-to-constant terms.
Proceed by structural induction over an RPlan $p$.
The base case, that $p$ is a lone relation, is canonical.
The inductive cases are as follows.
1) Suppose $p = p^1 + p^2$.
By the inductive hypothesis, $p^1$ and $p^2$ are canonical with forms, for $i \in \{1,2\}$, \\
$\sum_{A^i_{1}} \lrparen{b^i_{11} * \dots * b^i_{1m^i_{1}}} + \dots + \sum_{A^i_{n^i}} \lrparen{b^i_{n^i1} * \dots * b^i_{n^im^i_{n^i}}}$.
By rule~\ref{RRC_pp}, $p$ takes the canonical form \\
$\sum_{A^1_{1}} \lrparen{b^1_{11} * \dots * b^1_{1m^1_{1}}} + \dots + \sum{A^1_{n^1}} \lrparen{b^1_{n^11} * \dots * b^1_{n^1m^1_{n^1}}} +
\sum_{A^2_{1}} \lrparen{b^2_{11} * \dots * b^2_{1m^2_{1}}} + \dots + \sum{A^2_{n^2}} \lrparen{b^2_{n^21} * \dots * b^2_{n^2m^2_{n^2}}}$.
2) Suppose $p = \sum_A p'$.
By the inductive hypothesis, $p'$ is canonical with form \\
$\sum_{A_1} \lrparen{b_{11} * \dots * b_{1m_1}} + \dots + \sum_{A_n} \lrparen{b_{n1} * \dots * b_{nm_n}}$.
Rule \ref{RRC_ap} distributes the aggregate into plus;
rule \ref{RRC_ac} handles constant aggregations, as after the expression $\sum_{ij} (X + 7)$;
rule \ref{RRC_aa} combines aggregates.
The result has canonical form \\
$\sum_{A_1 \cup A} \lrparen{b_{11} * \dots * b_{1m_1}} + \dots + \sum_{A_n \cup A} \lrparen{b_{n1} * \dots * b_{nm_n}}$.
3) Suppose $p = p^1 * p^2$.
By the inductive hypothesis, $p^1$ and $p^2$ are canonical with forms, for $i \in \{1,2\}$, \\
$\sum_{A^i_{1}} \lrparen{b^i_{11} * \dots * b^i_{1m^i_{1}}} + \dots + \sum_{A^i_{n^i}} \lrparen{b^i_{n^i1} * \dots * b^i_{n^im^i_{n^i}}}$.
% $\sum_{A_{i1}} \lrparen{b_{i11} * \dots * b_{i1m_1}} + \dots + \sum{A_{in}} \lrparen{b_{in1} * \dots * b_{inm_n}}$.
Rule \ref{RRC_mp} multiplies across the Cartesian product of summands of $p_1$ and $p_2$.
Rules \ref{RRC_ma}, \ref{RRC_mm}, and \ref{RRC_aa} push the multiplications further down into aggregations and other multiplications.
In particular, rule \ref{RRC_ma} may rename attributes; we use the `$\sqcup$' symbol to denote disjoint union after renaming.
The result is the canonical form \\
$\sum_{A^1_{1} \sqcup A^2_{1}} \lrparen{b^1_{11} * \dots * b^1_{1m^1_1} * b^2_{11} * \dots * b^2_{1m^2_1}} + \dots + \\
\sum_{A^1_{n^1} \sqcup A^2_{n^2}} \lrparen{b^1_{n^11} * \dots * b^1_{n^1m^1_{n^1}} * b^2_{n^21} * \dots * b^2_{n^2m^2_{n^2}}}$.
\end{proof}
\begin{lemma}\label{lCanonPreservesSemantics}
Canonization preserves semantics: \\
$\forall p \in \RPLAN{}, \forall \text{inputs } d,\, p(d) = \mathcal{C}(p)(d)$.
\end{lemma}
\begin{proof}
Every rule composing $\mathcal{C}$ in $R_{RC}$ stems from a mathematical property of $+$, $*$, and $\sum$.
\end{proof}
In the following proof, we write syntactic equality with the symbol $\equiv$, in order to distinguish it from value equality.
\begin{theorem}
Two RPlans are semantically equivalent iff they have identical canonical forms:
$\forall p_1, p_2 \in \RPLAN{},\, \mathcal{C}(p_1) \equiv \mathcal{C}(p_2) \iff \forall \text{inputs }d,\, p_1(d) = p_2(d)$.
\end{theorem}
\begin{proof}
For the $\Rightarrow$ direction, two plans that have syntactically identical canonical forms must compute the same result on every input because, per Lemma \ref{lCanonPreservesSemantics}, canonization preserves semantics.
The $\Leftarrow$ direction is more interesting. Proceed by proving its contrapositive:
$\mathcal{C}(p_1) \not\equiv \mathcal{C}(p_2) \Rightarrow \exists d,\, p_1(d) \neq p_2(d)$.
It is sufficient to prove
$\mathcal{C}(p_1) \not\equiv \mathcal{C}(p_2) \Rightarrow \exists d,\, \mathcal{C}(p_1)(d) \neq \mathcal{C}(p_2)(d)$
because, by Lemma \ref{lCanonPreservesSemantics}, $\mathcal{C}(p)(d) = p(d)$.
To find a $d$ that the canonical forms disagree on, given that the canonical forms syntactically differ,
we supply a partial input (i.e., input relations with holes) that reduces each $\mathcal{C}(p)$ to syntactically different polynomials in canonical form.
Then, by Lemma XX, we use the fact that syntactically different polynomials in canonical form differ on a concrete input,
and we construct a full input database $d$ using this concrete input to finish the proof.
First we construct a partial input.
Pick a term $t_0 \in p_1$ such that $t_0 \not\in p_2$. Such a term must exist because $p_1 \not\equiv p_2$. [Todo: introduce graphs]
Next pick the term $t \in p_1 \text{ or } p_2$ such that $t$ is the maximal term among all terms in $p_1$ and $p_2$ that have a surjective homomorphism to $t$.
We shall design a partial input database using $t$.
Suppose there are $n$ attributes (before aggregating) in $t$.
Assign each attribute a unique positive integer between $1$ and $n$ and let $g$ be the bijection from $t$'s attributes to $[n]$.
For each multiplicand $A_{ij}$, introduce a fresh scalar variable $a_{ij}$ at entry $(g(i), g(j))$ of the input matrix $A$.
Let other entries in each input matrix be 0.
Expand the aggregations in $p_1$ and $p_2$ with respect to the new symbolic matrices.
The result is a polynomial in canonical form.
It is in canonical form because the sums-of-aggregations-of-products in the original RPlans expands to a polynomial in sum-of-products form,
and as no two terms in the original RPlan were isomorphic, neither are any two terms in the polynomial. Todo: say more about the surjective homomorphism.
Call the above reduction $d(t)$, where $t$ is the term chosen by the above process. We write $t_1[d(t)]$ for the result of applying the partial input generated by term $t$ to $t_1$. Two properties of the transformation are relevant:
\begin{enumerate}
\item For any two terms $t_1$ and $t_2$ that are isomorphic, $t_1[d(t)] \equiv t_2[d(t)]$.
To see this, observe that each multiplicand in $t_1[d(t)]$ arose from a multiplicand in $t_1$, and that there is an equivalent multiplicand in $t_2[d(t)]$ multiplied in the same way due to the isomorphism between $t_1$ and $t_2$.
\item For any terms $t_1$ and $t_2$ that are not isomorphic, if there does not exist a surjective homomorphism from $t_2$ to $t_1$, then $t_1[d(t)] \not\equiv t_2[d(t)]$. Proof todo.
\end{enumerate}
Because $\mathcal{C}(p_1)$ syntactically differs from $\mathcal{C}(p_2)$, then by the second property above it follows that $\mathcal{C}(p_1)[d(t)]$ is a polynomial that syntactically differs from $\mathcal{C}(p_2)[d(t)]$.
Applying Lemma XX on polynomial canonical forms proves that a particular input on the polynomial variables $d$ exists on which the two polynomials disagree. Plugging $d$ into the symbolic matrices designed by $d(t)$ results in a database input to the original RPlans disagree, showing that $\exists d,\, \mathcal{C}(p_1)(d) \neq \mathcal{C}(p_2)(d)$.
\end{proof}
\subsection{RPlan Graphbag}
% An RPlan \textit{without uninterpreted functions}\footnote{Uninterpreted functions split an RPlan in two: the RPlan below and the RPlan above. Each may be represented as a separate RPlan Graphbag.} in canonical form may be represented as a bag of hypergraphs, defined as follows.
% Each child of $+$ is represented as a separate graph, the collection of which forms a bag of graphs.
% A child of $+$ consists of a set of aggregated attributes, followed by the product of inputs.
% The graph is a directed and labeled hypergraph.
% The vertices $V$ are the union of the schemas of the inputs.
% Each vertex is either \textit{aggregated}, if it is in the set of aggregated attributes, or \textit{output}, if it is not.
% There is one hyperedge for each input, labeled with the input itself, on the vertices in its schema *** todo need to include the input bind and output bind *** redefine to include conv. operators -- RPlan+
\nocite{CohenDDHW09}
% OLD VERSION
% For the $\Rightarrow$ direction, two plans that have syntactically identical canonical forms must compute the same result on every input because, per Lemma \ref{lCanonPreservesSemantics}, canonization preserves semantics.
% The $\Leftarrow$ direction is more interesting. Proceed by proving its contrapositive:
% $\mathcal{C}(p_1) \neq_{syn} \mathcal{C}(p_2) \Rightarrow \exists d,\, p_1(d) \neq p_2(d)$.
% It is sufficient to prove
% $\mathcal{C}(p_1) \neq_{syn} \mathcal{C}(p_2) \Rightarrow \exists d,\, \mathcal{C}(p_1)(d) \neq \mathcal{C}(p_2)(d)$
% because, by Lemma \ref{lCanonPreservesSemantics}, $\mathcal{C}(p)(d) = p(d)$.
% To find a $d$ that the canonical forms disagree on,
% given that the canonical forms syntactically differ,
% we use derivatives.
% It is well known that for any differentiable functions $f$ and $g$, $\forall d,\, f(d) = g(d) \Rightarrow \forall d,\, f'(d) = g'(d)$; the contrapositive of this fact gives us the proof technique we need.
% In order to take derivatives, we intentionally blur the line between RPlans and (multi)linear algebra, because an RPlan relation with $n$ attributes is equivalent to a tensor with $n$ named dimensions.
% Let $t$ be one of the summed terms in $\mathcal{C}(p_1)$ that does not appear in $\mathcal{C}(p_2)$, which must exist since they syntactically differ.
% By definition of the canonical form, no other term in $\mathcal{C}(p_1)$ is isomorphic-up-to-constants to $t$.
% Derivate $\mathcal{C}(p_1)$ and $\mathcal{C}(p_2)$ for each input in $t$ to obtain $\mathcal{C}(p_1)'$ and $\mathcal{C}(p_2)'$.
% Consider $d=0$ (zero for all inputs).
% We have that $t'(0) \neq 0$ because $t$ is a constant term (since we derivated $t$ by each input in $t$).
% Other terms in $\mathcal{C}(p_1)'$ evaluate to 0 because they either (a) began with a subset of the inputs in $t$, which results in their derivative being 0, or (b) have inputs remaining after taking the derivatives, which evaluate to 0 on input 0.
% Therefore $\mathcal{C}(p_1)'(0) = t'(0) \neq 0$.
% For every term in $\mathcal{C}(p_2)$ that is not isomorphic-up-to-constants to $t$, its derived counterpart in $\mathcal{C}(p_2)'$ evaluates to 0. There is at most one term in $\mathcal{C}(p_2)$ that is isomorphic-up-to-constants. If there is such a term, then it must have a different constant and so $t'(0) \neq \mathcal{C}(p_2)'(0)$, or else it would be identical to $t$ and $t$ would appear in $\mathcal{C}(p_2)$, contradicting our choice of $t$.
% Therefore $\mathcal{C}(p_1)'(0) \neq \mathcal{C}(p_2)'(0)$.
% As the derivatives differ on input 0, it must hold that $\exists d,\, \mathcal{C}(p_1)(d) \neq \mathcal{C}(p_2)(d)$.
\section{Related Work}
Relational join optimization: \cite{moerkotte2006analysis}.
Looks at the join graph, whose vertices are relations and edges are between two relations that have a join condition (overlap in their attributes). No aggregation or projection. Enumerates join tree possibilities without duplicates. Enumerates connected-subgraph complement pairs \emph{csg-cmp-pairs}, but this is automatically given by the query graph where attributes are vertices and relations are hyperedges, isn't it?
The closest sum-product optimization frameworks in the literature (FAQ \cite{KhamisNR16}, AJAR \cite{Joglekar2016AJARAA}, PANDA) cover LA expressions involving $*$ and $\sum$.
These frameworks have an excellent coverage of theory and implementation,
but they fall short of LA workloads that occur in practice.
We step past these frameworks by incorporating common sub-expressions and incorporating addition ($+$).
The lineage of optimizer work from Starburst <- System R -> Exodus -> Volcano -> Cascades -> Orca; Columbia -> Agrios/Bonneyville, as well as some UDF-like ones stemming from Nephele/PACTs