EPFL Database Systems class - Project2

This repo contains the solution of the Database System class @ EPFL, developed by me and @manuleo.

We used Scala and Spark (using RDDs only for the first 2 exercises, as Spark SQL and Spark DataFrame were not allowed). This project can be divided in 3 tasks:

Implementation of the ROLLUP operator.
Implementation of a MapReduce‑friendly theta‑join, according to the 1‑Bucket‑Theta Algorithm by Okcan et al. with additional Reduce optimizations.
Implementation of approximated kNNs via Jaccard-similarity locality sensitive hash functions.

We have also written unit tests for the latter 2 tasks.

Finally, we have also written a report about the results we obtained.

Our project has been graded 6/6.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
project		project
results		results
src		src
.gitignore		.gitignore
Project2.pdf		Project2.pdf
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EPFL Database Systems class - Project2

About

Releases

Packages

Contributors 3

Languages

dedeswim/db-spark-projects

Folders and files

Latest commit

History

Repository files navigation

EPFL Database Systems class - Project2

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages