Skip to content

ChainsawPerson/Advanced-DB-2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced-DB-2025

How to run the notebook locally:

Windows 10/11:

Requirements:


  1. Python 3
  2. Pyspark
  3. Apache Hadoop
  4. Apache Sedonna

Set up:


  1. Download and install Python. We will be using Python 3.11.9 for compatibility purposes. Make sure to add python to Path.

  2. Open command prompt and install pyspark with pip:

    pip install pyspark
    

    Download and extract the pyspark distribution from the Apache Spark website (we will be using pyspark-3.5.4 with hadoop 3).

    • After extracting the file, create the SPARK_HOME system environment and add the path\to\pyspark-3.5.4-bin-hadoop3.
    • Add path\to\pyspark-3.5.4-bin-hadoop3\bin to Path.
  3. Download Apache Hadoop and extract its files to your prefered path (we use version 3.3.6) In the System Environment Variables make the following additions:

    • Add path\to\hadoop\bin to Path
    • Create a variable HADOOP_HOME with value path\to\hadoop
  4. Install Apache Sedona using pip:

    pip install apache-sedona
    

    We will need to download some files for it to work correctly:

    We will add both of these files to path\to\spark-3.5.4-bin-hadoop3\jars

About

Report on Advanced Database NTUA 2024-25

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published