Skip to content

awaazde/big-data-stack-practice

 
 

Repository files navigation

To develop and test Glue scripts in local environment, this package mocks underlying AWS functionalities. To make this application platform independant, core functionalities have been dockerized.

Requirements

  1. Docker,
  2. Postgres,
  3. Java v8,
  4. Python 3

Components

  1. Hive,
  2. Minio Object storage (minio, minio123) Local endpoint: localhost:9000,
  3. Trino database (admin, no password) Local endpoint: localhost:8080,
  4. AWS glue libs

Steps

  1. Install & run docker,
  2. Setup postgres db,
  3. Clone this repository,
  4. Install the required dependencies & set enviornment variables using make install command,
  5. Restart the terminal,
  6. Once inside the directory, run make up

Commands

  1. Installation: make install

     Note: After installation, one needs to restart the terminal. This ensures that all the required environment variables are set permanently.
    
  2. To run: make up

  3. To stop: make down

Environment Variables

Please note, following env. variables are set during the installation process. They are listed here for reference purpose and not to be set explicitly. 
  1. AWS_REGION,
  2. AWS_ACCESS_KEY_ID,
  3. AWS_SECRET_ACCESS_KEY,
  4. SPARK_HOME,
  5. PYTHONPATH

References

  1. Mock AWS Athena for your ETL tests,
  2. Big data stack practice
  3. Developing and Testing ETL Scripts Locally Using the AWS Glue ETL Library

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Makefile 42.7%
  • Shell 36.2%
  • Dockerfile 19.3%
  • HiveQL 1.8%