An email spam detection system identifies spam emails using a machine learning technique known as natural language processing, implemented in Python. We have a dataset containing numerous emails, and by extracting key words, we can utilize a naive classifier to determine whether an email is spam or not.
This is list of required packages and modules for the project to be installed :
- Python 3.x
- Pandas
- Numpy
- Scikit-learn
- NLTK
Install all required packages :
pip install -r requirements.txt
The human activities dataset contains about 5728 records, a sample of an email, and a target column "spam" which describes the state of email spam.
In this part we will see the project code divided into sections as follows:
-
Section 1 | The Data :
In this section we aim to do some operations on the dataset before training the model on it, processes like :- Data Loading: Load the dataset
- Data Visualization: Visualize dataset features
- Data Cleaning: Remove stopwords and duplicates values
- Data Splitting: Split the dataset into training and testing sets
-
Section 2 | The Model :
The dataset is ready for training, so we create a naive classifier using sci-kit-learn and thin-fit it to the data, and finally, we evaluate the model by getting accuracy, classification report and confusion matrix
- Clone the repo
git clone https://github.com/omaarelsherif/Email-Spam-Detection-Using-Machine-Learning.git
- Open 'main. ipynb' in Google Colab or VScode and enjoy
These links may help you to a better understanding of the project idea and techniques used :
- Spam detection in machine learning: https://bit.ly/3nwiKtA
- Naive-Bayes algorithm: https://bit.ly/3zc9SLH
- Model evaluation: https://bit.ly/3B12VOO