Skip to content

JainTanisha/MapReduce-Analysis-on-Amazon-Food-Review-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Project Title: MapReduce Analysis on Amazon Food Review Dataset

Summary: This project is based on the Analysis of the ‘Amazon Fine Food Review’ dataset. The raw dataset consists of reviews of fine foods from Amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product id and user information, ratings, and a plaintext review. In the project, first the analysis of the raw dataset using Tableau is done. Then the cleaning of the data files is done using RScript on RStudio. The MapReduce analysis is performed using several MapReduce patterns such as Partitioning, Distinct etc. Also, in this project, Apache Hive and Apache Mahout is implemented. Amazon Elastic Map Reduce is also used to implement MapReduce algorithms on cloud provided by Amazon Web Services. The analysis on the dataset can help Amazon products with its sales, scope of improvement and honest user reviews for many other users buying similar products.

Dataset Link: https://www.kaggle.com/snap/amazon-fine-food-reviews (originally this dataset was published on http://snap.stanford.edu/data/web-FineFoods.html)

Tableau Desktop Link: https://public.tableau.com/profile/tanishajain

Please see the detailed report of the project here: https://github.com/JainTanisha/MapReduce-Analysis-on-Amazon-Food-Review-Data/blob/master/ADBMS-FinalProjectReport.pdf

Releases

No releases published

Packages

No packages published