-
Dataset Link - Goodreads Books 100k 📊
-
Alternate Dataset Link (requires additional preprocessing and changes in model) - GoodreadsBooks📊
-
Streamlit Documentation - API Reference🧾
- Open
model.ipynb
Jupyter notebook and run the Model. - After running, you will have 2 pickle files.
- Run
apptest.py
using the command mentioned in the first line of code. - Ensure all required libraries, especially Streamlit, are installed (use the latest pip libraries from here).
pip install numpy
pip install pandas
pip install scikit-learn
pip install nltk
pip install pickle5
- Recommendation Systems
- Data Preprocessing
- Choosing a Recommendation Algorithm
- Feature Extraction
- Data Preprocessing II
- Algorithm Application
- Deployment and User Interface
A recommendation system predicts and presents items that a user might find relevant based on preferences and behavior.
-
- Principle: Recommends items similar to what the user has shown interest in.
- Approach: Analyzes item characteristics and user preferences.
- Example: Recommending movies based on genres or books based on content and genres.
-
- Principle: Recommends items based on the preferences and behaviors of similar users.
- Approach: Utilizes user-item interaction data to identify patterns.
- Example: Suggesting products liked or purchased by users with similar tastes.
-
- Principle: A combination of content-based and collaborative filtering.
- Approach: Recommends items based on user history and similar users' recommendations.
- Example: Facebook and LinkedIn use hybrid filtering for personalized content.
Filter unnecessary data fields and remove duplicates and empty values for clean data.
Using the "Bag of Words" algorithm, a text representation technique converting titles, summaries, or reviews into a matrix of word frequencies.
Use book descriptions, authors' names, and genres to create a combined data field of tags for string matching.
Apply the Porter Stemmer algorithm for text stripping, reducing words to their root form and minimizing redundancy.
Utilize CountVectorizer to convert text into vectors based on word frequency, creating vectors from tags of all books.
-
- Measures straight-line distance between two points in Euclidean space.
- Sensitive to scale and affected by the curse of dimensionality.
-
- Measures cosine of the angle between two vectors.
- Scale-invariant and suitable for high-dimensional spaces.
Apply cosine similarity algorithm to get a similarity matrix. Create a function to return 6 nearest vectors with the least cosine distance.
Use Streamlit for the frontend, providing the easiest way to create a user interface for the Python application. Streamlit also offers free hosting for easy accessibility on any device with an internet browser.