This project is a web application for classifying SMS or email messages as spam or not spam. The application is built using Streamlit for the frontend, and it utilizes machine learning techniques to perform the classification. The Naive Bayes algorithm, specifically the Multinomial Naive Bayes variant, is employed for this task. The project also involves Natural Language Processing (NLP) tasks using NLTK to preprocess and prepare the data.
-
app.py
: The main Streamlit application script. This file contains the code for the web interface and handles user input and display of classification results. -
model.py
: This script is responsible for training the Naive Bayes model using the provided dataset. It saves the trained model for use in the web application. -
vectorizer.py
: This script contains the code for text vectorization. It prepares the text data for the model by converting it into a numerical format that the machine learning algorithm can process. -
Data Files: The dataset used for training and evaluation is sourced from Kaggle. The data files should be placed in the project directory.
-
Clone the Repository:
git clone https://github.com/rjsandeepkumawat/spam_classifier.git cd spam_classifier
-
Install Dependencies: Ensure you have Python 3.x installed. Install the required libraries using pip:
pip install streamlit scikit-learn nltk pandas
-
Run the Application: Start the Streamlit web application by running:
streamlit run app.py
- Open your browser and navigate to the local server address provided by Streamlit (usually
http://localhost:8501
). - Enter the SMS or email message you want to classify into the text input box.
- Click the "Predict" button to see if the message is classified as "SPAM" or "NOT SPAM".
- Algorithm: Multinomial Naive Bayes
- Library: scikit-learn
- NLP Tasks: Tokenization, stopword removal, stemming (using NLTK)
- Ensure that the NLTK resources (
punkt
andstopwords
) are downloaded before running the application. - The data used is publicly available on Kaggle and should be appropriately licensed for use.
If you wish to contribute to this project, please fork the repository and submit a pull request with your changes.