Project Title: Cosine Similarity based Chatbot
This project is a simple chatbot that uses cosine similarity for question answering. It performs tokenization and stopword removal on the user's input, then matches the input to a pre-defined list of questions using cosine similarity. If a match is found, the corresponding answer is returned to the user. If no match is found, the chatbot responds with "We can't answer this".
These instructions will help you get the project up and running on your local machine for development and testing purposes.
You will need the following packages to run the project -
- Python 3.x
- nltk
- numpy
Use the following command to install the necessary packages:
pip install nltk numpy
-
Clone the repository to your local machine
-
Navigate to the project directory
-
Run the following command to start the chatbot -
python app.py
The chatbot employs cosine similarity to find the best match for the user's input. The pre-defined list of questions and their corresponding answers are present in the test.csv
file, which is loaded into memory when the program starts. When the user enters a question, the following steps are performed:
- Tokenization - The user's input is tokenized, which means it is split into individual words or tokens.
- Stopword removal - Stopwords are words like "is", "the", "a", etc. which do not carry any significant meaning in the sentence. They are removed to reduce the noise in the text.
- Cosine similarity - The cosine similarity between the user's input and each question in the data set is calculated. The question with the highest cosine similarity score is chosen as the match for the user's input.
- Response - If a match is found, the corresponding answer is returned to the user. If no match is found, the chatbot responds with "We can't answer this".
The current version of the chatbot is a simple prototype. There are several avenues for improvement, such as -
- Using more advanced algorithms for matching the user's input to the pre-defined questions.
- Integrating natural language processing (NLP) techniques for better understanding of the user's input.
- Implementing a feedback system for users to rate the relevance of the answers provided by the chatbot.
- Call the chatGPT (Open AI APIs) to get a relevant answer that is not present in the Q&A dataset.