euphoria-flink: data type transparency #45

xitep · 2017-03-03T11:48:58Z

Execution engines can do much better at optimizations if they transparently know what types they are working with. Efforts in the Spark as well as the Flink community proof optimization potentials in this regard. As a layer above such execution engines Euphoria must provide type specific information to executors in order to stay relevant (in terms of performance). In this ticket we'll focus on Flink.

Motivation

The primary background for this ticket was triggered by an attempt to shave off some of the overhead mentioned in #13 and #14.

Experiment

Using opaque data types with general purpose serialization has considerable, negative effects on optimizations that Flink tries to apply by default. I was able to see the effect in an experiment, where the "core" operation of the flow is the following (basically just a windowed word-count):

      ReduceByKey
            .of(input)
            .keyBy(Pair::getSecond)
            .valueBy(e -> 1L)
            .combineBy(Sums.ofLongs())
            .windowBy(Time.of(shortInterval), Pair::getFirst)
            .output();

Changing Euphoria's flink batch executor such that it uses Flink's native and Flink's pojo based serializers for the types involved during the reduce operation, squeezed out about half of the original execution time of the program. The amount of data shuffled was approximately the same. Unfortunately, such an approach required me to explicitly provide the return type of the .keyBy function to Euphoria's FlinkExecutor's internals. I was not able to derive the return type of a lambda in an automatic manner without the user having to explicitly state it.

What to do next

Find a non-verbose way for clients to provide required type information to the executor translators, i.e. return types of UDFs (we might not really succeed here and might end up with some verbose construct; maybe we can make the verbose constructs and the implied optimizations merely an opt-in?)
Leverage the types in Euphoria's Flink batch and stream executors.
- Leverage type information for objects used internally in both flink executors to avoid general purpose serialization by kryo when possible.
- Using TupleX instead of Pair and Tripple is a good start.
- For internal types, utilizing POJOs instead of opaque data types is another easy gain.
- Note: it will not help much if we switch over to Tuple but fail to provide type information of the actually used fields.
- Attention: proper delegation of the window types from one operator to another due to attached windowing may require substantial changes to the current executor code.

The text was updated successfully, but these errors were encountered:

horkyada · 2017-03-27T14:46:48Z

We shouldn't forget that to allow Flink to fully operate on serialized data, it needs to determine the right offset of the desired data. Flink does it through the possibility of setting the number of the field(s) in tuples and names of the filed(s) in POJO. E.g.:

 DataStream<MyPojo<String, Integer>> longTrends = prepared
        .keyBy("query")
        .timeWindow(longInterval, shortInterval)
        .sum("count");

This should be probably covered by Euphoria too.

xitep added client-api flink performance labels Mar 3, 2017

xitep pushed a commit that referenced this issue Apr 12, 2017

#45 Add license header

12927b9

xitep pushed a commit that referenced this issue Apr 12, 2017

#45 Add @experimental

d1c8399

xitep pushed a commit that referenced this issue Apr 24, 2017

#45 Add license header

e459f4c

xitep pushed a commit that referenced this issue Apr 24, 2017

#45 Add @experimental

340e93f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

euphoria-flink: data type transparency #45

euphoria-flink: data type transparency #45

xitep commented Mar 3, 2017 •

edited

Loading

horkyada commented Mar 27, 2017

euphoria-flink: data type transparency #45

euphoria-flink: data type transparency #45

Comments

xitep commented Mar 3, 2017 • edited Loading

Motivation

Experiment

What to do next

horkyada commented Mar 27, 2017

xitep commented Mar 3, 2017 •

edited

Loading