This repo contains the solution of the Database System class @ EPFL, developed by me and @manuleo.
We used Scala and Spark (using RDDs only for the first 2 exercises, as Spark SQL and Spark DataFrame were not allowed). This project can be divided in 3 tasks:
- Implementation of the ROLLUP operator.
- Implementation of a MapReduce‑friendly theta‑join, according to the 1‑Bucket‑Theta Algorithm by Okcan et al. with additional Reduce optimizations.
- Implementation of approximated kNNs via Jaccard-similarity locality sensitive hash functions.
We have also written unit tests for the latter 2 tasks.
Finally, we have also written a report about the results we obtained.
Our project has been graded 6/6.