Advanced-DB-2025

Local Notebook:

How to run the notebook locally:

Download and install Python. We will be using Python 3.11.9 for compatibility purposes. Make sure to add python to Path.
Open command prompt and install pyspark with pip:
```
pip install pyspark
```
Download and extract the pyspark distribution from the Apache Spark website (we will be using pyspark-3.5.4 with hadoop 3).
- After extracting the file, create the SPARK_HOME system environment and add the path\to\pyspark-3.5.4-bin-hadoop3.
- Add path\to\pyspark-3.5.4-bin-hadoop3\bin to Path.
Download Apache Hadoop and extract its files to your prefered path (we use version 3.3.6) In the System Environment Variables make the following additions:
- Add path\to\hadoop\bin to Path
- Create a variable HADOOP_HOME with value path\to\hadoop
Install Apache Sedona using pip:
```
pip install apache-sedona
```
We will need to download some files for it to work correctly:
- Sedona Spark Shaded jar
- Geotools Wrapper jar
We will add both of these files to path\to\spark-3.5.4-bin-hadoop3\jars

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
AWS		AWS
Local		Local
.gitignore		.gitignore
Advanced_Databases.pdf		Advanced_Databases.pdf
README.md		README.md
advanced_db_project_2024.pdf		advanced_db_project_2024.pdf