Movies-ETL

Project Overview

Amazing Prime loves the dataset and wants to keep it updated on a daily basis. They would like to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables.

Purpose: -

To write ETL function to read Wikipedia data, Kaggle metadata, and the MovieLens rating data and to create three separate DataFrames.

To extract and transform the Wikipedia and Kaggle data

To create the movie database; PostgresSQL movie Database

Resources: -

wikipedia-movies.json, movies_metadata.csv and ratings.csv

Results: - (please refer to the attached files for details)

Deliverable 1: ETL Function was written to read three data files

Deliverable 2 and 3: Wikipedia and Kaggle data were extracted and transformed

Deliverable 4: Movie database was created

Load the data to a PostgregSQL movie database

db_string = f"postgresql://postgres:{db_password}@localhost:5432/movie_data" # create database engine engine = create_engine(db_string) movies_df.to_sql(name='movies', con=engine, if_exists='replace') rows_imported = 0 get the start_time from time.time() start_time = time.time() for data in pd.read_csv(f'ratings.csv', chunksize=1000000): print(f'importing rows {rows_imported} to {rows_imported + len(data)}...', end='') data.to_sql(name='ratings', con=engine, if_exists='append') rows_imported += len(data) # add elapsed time to final print out print(f'Done. {time.time() - start_time} total seconds elapsed')

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
README.md		README.md
Written_Report.pdf		Written_Report.pdf
movies_metadata.csv		movies_metadata.csv
movies_metadata.zip		movies_metadata.zip
movies_query.PNG		movies_query.PNG
ratings_query.PNG		ratings_query.PNG
wikipedia-movies.json		wikipedia-movies.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies-ETL

Project Overview

Purpose: -

To write ETL function to read Wikipedia data, Kaggle metadata, and the MovieLens rating data and to create three separate DataFrames.

To extract and transform the Wikipedia and Kaggle data

To create the movie database; PostgresSQL movie Database

Resources: -

Results: - (please refer to the attached files for details)

Deliverable 1: ETL Function was written to read three data files

Deliverable 2 and 3: Wikipedia and Kaggle data were extracted and transformed

Deliverable 4: Movie database was created

About

Releases

Packages

Languages

Tekateka/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Project Overview

Purpose: -

To write ETL function to read Wikipedia data, Kaggle metadata, and the MovieLens rating data and to create three separate DataFrames.

To extract and transform the Wikipedia and Kaggle data

To create the movie database; PostgresSQL movie Database

Resources: -

Results: - (please refer to the attached files for details)

Deliverable 1: ETL Function was written to read three data files

Deliverable 2 and 3: Wikipedia and Kaggle data were extracted and transformed

Deliverable 4: Movie database was created

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages