Project Title: MapReduce Analysis on Amazon Food Review Dataset
Summary: This project is based on the Analysis of the ‘Amazon Fine Food Review’ dataset. The raw dataset consists of reviews of fine foods from Amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product id and user information, ratings, and a plaintext review. In the project, first the analysis of the raw dataset using Tableau is done. Then the cleaning of the data files is done using RScript on RStudio. The MapReduce analysis is performed using several MapReduce patterns such as Partitioning, Distinct etc. Also, in this project, Apache Hive and Apache Mahout is implemented. Amazon Elastic Map Reduce is also used to implement MapReduce algorithms on cloud provided by Amazon Web Services. The analysis on the dataset can help Amazon products with its sales, scope of improvement and honest user reviews for many other users buying similar products.
Dataset Link: https://www.kaggle.com/snap/amazon-fine-food-reviews (originally this dataset was published on http://snap.stanford.edu/data/web-FineFoods.html)
Tableau Desktop Link: https://public.tableau.com/profile/tanishajain
Please see the detailed report of the project here: https://github.com/JainTanisha/MapReduce-Analysis-on-Amazon-Food-Review-Data/blob/master/ADBMS-FinalProjectReport.pdf