Data Engineering Nanodegree Project Collection

Project 1 & 2 - Data Modeling with PostgreSQL & Apache Cassandra

In project 1 & 2, I perform data modeling on user activity data for a music streaming app called Sparkify. The first project focuses on a relational data model with Postgres and an ETL pipeline using Python. The secong project focuses on a NoSQL data model with Apache Cassandra and an ETL pipeline using Python. Skills: PostgreSQL, Apache Cassandra, ETL pipelines, data normalization/denormalization, Python

Project 3 - Cloud Data Warehousing

In this project, I build a Data Warehouse on AWS cloud and an ETL pipeline that (1) extracts Sparkify’s data from S3; (2) stages them in Amazon Redshift; and (3) transforms them into a set of fact and dimensional tables. Skills: Amazon Redshift, AWS CLI, AWS SDK, Infrastructure-as-Code (IaC), Python, SQL

Project 4 - Data Lakes with Apache Spark

In this project, I build a Data Lake on AWS cloud and an ETL pipeline that (1) extracts data from S3; (2) processes them using Apache Spark; and (3) loads them back into S3 as a set of dimensional tables. The Spark process is then deployed on an AWS EMR cluster. Skills: Apache Spark, AWS EMR

Project 5 - Data Pipelines with Apache Airflow

In this project, I automate a set of ETL data pipelines and data warehouse construction using Apache Airflow. The development process includes (1) configuring Airflow to automate data pipelines; (2) writing custom operators to perform tasks such as staging, loading, transformation, and validation. Skills: Apache Airflow, Amazon Redshift, Python, SQL

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Data_Lake_with_Spark		Data_Lake_with_Spark
Data_Modeling		Data_Modeling
Data_Pipeline_with_Airflow		Data_Pipeline_with_Airflow
Data_Warehouse_Redshift		Data_Warehouse_Redshift
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Nanodegree Project Collection

Project 1 & 2 - Data Modeling with PostgreSQL & Apache Cassandra

Project 3 - Cloud Data Warehousing

Project 4 - Data Lakes with Apache Spark

Project 5 - Data Pipelines with Apache Airflow

About

Languages

chenliny-zz/Udacity_Data_Engineering

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Nanodegree Project Collection

Project 1 & 2 - Data Modeling with PostgreSQL & Apache Cassandra

Project 3 - Cloud Data Warehousing

Project 4 - Data Lakes with Apache Spark

Project 5 - Data Pipelines with Apache Airflow

About

Resources

Stars

Watchers

Forks

Languages