euphoria-core: map-side join support #41

xitep · 2017-02-24T13:55:53Z

Map-side join is a special derivation of the Join operator, which can be turned into a plain MapElements operation, looking up the other side of the data to join in through a random-access API.

We'd like the write down such join operations through the existing Join operator. The special derivation mentioned above is merely a runtime characteristic which does not alter the semantics of the operation. In other words, a map-side join might be more efficient in certain situations, but the operator would still produce the results if carried out through a conventional (hash or sort based) join implementation. Prerequisites:

We'd need to provide support for data sets that can be accessed randomly.
We'd need to provide some way of hinting an executor to either do or not to do a map-side join "optimization" when it's possible.
The join operation would be effectively only a left or a right join (with zero or exactly one joined-value to be practical) - inner join semantics can be implemented through both of these.

On the other hand, if we are about to join two data sets which are already ordered by the "join key" and one or the other would provide the possibility to seek, we'd be able to optimize such a map-side join even more efficiently by turning the look-up into a re-seek. for a left-join, this would naturally support N join-values, which the random-access approach doesn't. As we can see, this is approach is a super-set of the random-access-approach. The canonical use-case would be to join to distinct databases with the same "primary key" (a typical key/value store delivers items ordered by the "key").

We clearly need more elaboration here, before starting with it. TBD

The text was updated successfully, but these errors were encountered:

je-ik · 2017-02-28T09:08:05Z

The main issue here is how to ensure that the operation is guaranteed to have the same outcome as if it would be executed in classical reduce-state-by-key manner. There is no problem with this when there is no windowing (i.e. batch windowing), but with some other windowing, it might be tricky, as it would probably require that the random access storage would "understand" the windowing and that it would be able to store multiple windowed values for the same key.

I'd suggest, that we restrict the scope of this issue only to non-windowed (batch windowed) joins. Then it might solve part of the comments in #38.

xitep added the idea label Feb 24, 2017

xitep mentioned this issue Feb 27, 2017

euphoria-core: unbounded source without explicit windowing defined #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

euphoria-core: map-side join support #41

euphoria-core: map-side join support #41

xitep commented Feb 24, 2017

je-ik commented Feb 28, 2017

euphoria-core: map-side join support #41

euphoria-core: map-side join support #41

Comments

xitep commented Feb 24, 2017

je-ik commented Feb 28, 2017