Benchmarks

We maintain a set of programs used to measure the performance of the Euphoria executors. To compare the runtime overhead of Euphoria, we implement these benchmark programs also in the native APIs provided by the execution engines. We aim for implementing each program as most naturally and idiomatically as is appropriate for the corresponding API. As a nice side-effect, it allows us to compare the programs also from a readability point of view. Below is a list of the benchmark programs, briefly outlining the calculation and the expected format of the input data.

Trends

Given a log of user search queries from a site like Seznam or Google, the goal is to find out "trending" queries, i.e. queries which are becoming "hot" during a short period of time, e.g. the last hour. It boils down to comparing a specified query's popularity during the last hour against its popularity during the last day for example. The exact formula to compute a query's trending score would be:

(shortCount / (longCount + smoothness)) * (longInterval / shortInterval)

where:

shortCount the number of occurrences of a query during the short interval, e.g. the last hour
longCount the number of occurrences of a query during the long interval, e.g. the last day
shortInterval the period of the short interval, e.g. 1 hour
longInterval the period of the long interval, e.g. 24 hours
smoothness is a high enough constant to pretend a query has always a certain base count/popularity.

The programs' outputs should be the top most N trending queries for each hour in the time space considered.

Input format

unix-time-in-millis \t query

Implementations

Native Flink - DataStream API
Native Flink - DataSet API
Native Spark - RDD API
Native Spark - DStream API: not yet implemented
Beam implementation - executable on bounded inputs only
Euphoria

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Trends

Input format

Implementations

Clone this wiki locally