Spam Detector is a web-based application that allows users to train a machine learning model to detect spam messages. It supports two feature extraction methods: Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF).
- General info
- Deployment on PythonAnywhere
- Data
- How it works
- Technologies
- Requirements
- Setup
- Deployment locally
- Contributing
This project is an SMS spam detector implemented as a Flask web application. It provides a user-friendly interface to train and evaluate a machine learning model for spam detection.
This application is deployed on PythonAnywhere, a cloud-based Python hosting service. You can access the live application at adriendimitri.pythonanywhere.com.
The available data is the SMS Spam Collection Dataset from the UC Irvine Machine Learning Repository.
The SMS Spam Collection v.1 (hereafter the corpus) is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, all classified as either ham (legitimate) or spam. The first word of each text is the classification, followed by a space, and the rest is the SMS itself.
The SMS Spam Collection v.1 (text file: smsspamcollection) has a total of 4,827 SMS legitimate messages (86.6%) and a total of 747 (13.4%) spam messages.
For features extraction, you may choose either Bag-of-Words (BoW) or Term Frequency - Inverse Document Frequency (TF-IDF), with both giving good results.
A Naive Bayes Classifier is used as the core during training which saves all the parameters of spam and ham sms messages based on the features extracted prior.
-
Clone the repository
-
Navigate to the project directory:
$ cd spam-detector
-
Create a virtual environment (optional but recommended):
$ python -m venv .venv
-
Activate the virtual environment:
- On Windows:
$ .venv\Scripts\activate
- On macOS and Linux:
$ source .venv/bin/activate
-
Install the required packages using the command below:
$ pip install -r requirements.txt
-
Run the Flask application:
$ python run.py
Note: It's recommended to use a virtual environment to isolate the project dependencies. If you choose not to use a virtual environment, make sure to adapt the installation command (pip install -r requirements.txt
) accordingly.
Access the application by navigating to http://localhost:5000 in your web browser.
If you'd like to contribute to the development of the SMS Spam Detector, please follow the guidelines in CONTRIBUTING.md.