In this python project, I am trying to build a movie recommendation system based on user interactions. For example,
- previously watched movies,
- user search query
I will use movie data csv from url https://query.data.world/s/uikepcpffyo2nhig52xxeevdialfl7
I followed following steps to implement this project.
- Load data from above url.
- Selected relevant columns from the dataset
- Prepared text content as suitable for applying filtering algorithms
- Extracted Keywords for each record
- Created a Bag of words
- Dropped all other irrelevant columns.
- Generate CountVectorizer()
- Generate Cosine Similarity Matrix
- Implemented recommender function
Finally, I filter top 10 movie suggestions based on a user search query. (i.e. movie title)
I have used rake_nltk for this project.
!pip install rake_nltk
- Cosine Similarity
- Document Term Frequency
- Bag of Words
- Basic text pre-processing for NLP
- Basic usage of rake_nltk and Rake() class
- CountVectorizer() to convert Bag of words to numeric data