This project explores the dataset of 1000 most popular movies from the IMDB database during the period of 2006-2016. The project is divided into 5 main components:
- Problem Statement
- Data Wrangling
- Questions and EDA
- Conclusion
- Actionable Insights
- Communicate
The components from 1 through 5 are captured in Jupyter Notebook. Component 6 is done through Presentation and Voice Overlay of Presentation (You will have to download these from the links)
- Problem Statement
After collecting some initial questions, I came up with a hypothetical problem: In 2017, a certain production company, ABC decides to produce movies that will earn the best in terms of revenue, popularity and acclaim. This company approaches agency, XYZ and asks them to come up with characteristics of movies that will help them achieve their purpose.
- Data Wrangling
I gathered the data, examined and cleaned it to make it ready for EDA.
- Questions and EDA
Then I added more questions that aligns with the Problem Statement. Used these questions to explore the data using descriptive statistics and visualization. Noted down my findings from the exploration.
- Conclusion
I drew conclusions from my data exploration in this section.
- Actionable Insights
In this section, I came up with actionable insights from the exploration and conclusion to solve the Problem Statement.
- Communicate
Finally, I communicated my results through Presentation and Voice Overlay.