-
Notifications
You must be signed in to change notification settings - Fork 11
Benchmarks
We maintain a set of programs used to measure the performance of the Euphoria executors. To compare the runtime overhead of Euphoria, we implement these benchmark programs also in the native APIs provided by the execution engines. We aim for implementing each program as most naturally and idiomatically as is appropriate for the corresponding API. As a nice side-effect, it allows us to compare the programs also from a readability point of view. Below is a list of the benchmark programs, briefly outlining the calculation and the expected format of the input data.
Given a log of user search queries from a site like Seznam or Google, the goal is to find out "trending" queries, i.e. queries which are becoming "hot" during a short period of time, e.g. the last hour. It boils down to comparing a specified query's popularity during the last hour against its popularity during the last day for example. The exact formula to compute a query's trending score would be:
(shortCount / (longCount + smoothness)) * (longInterval / shortInterval)
where:
-
shortCount
the number of occurrences of a query during the short interval, e.g. the last hour -
longCount
the number of occurrences of a query during the long interval, e.g. the last day -
shortInterval
the period of the short interval, e.g. 1 hour -
longInterval
the period of the long interval, e.g. 24 hours -
smoothness
is a high enough constant to pretend a query has always a certain base count/popularity.
The programs' outputs should be the top most N trending queries for each hour in the time space considered.
unix-time-in-millis \t query
- Native Flink - DataStream API
- Native Flink - DataSet API
- Native Spark - RDD API
- Native Spark - DStream API: not yet implemented
- Beam implementation - executable on bounded inputs only
- Euphoria