Bayesian Classifier

I wrote a program to classify sets of email features into spam and not-spam cases using the Spambase data set and a naive Bayesian classifier.

First, using the training data, the mean and standard deviation is calculated for all the email features, as well as the prior probabilities. These values are used with the Gaussian Naive Bayes algorithm and that classification method is used on the test data. The results are then compared to the true labels.

For the test data, it produced an accuracy of 78.4%, a precision of 68.6%, and a recall of 83.6% (all values are rounded).

The confusion matrix:

The accuracy overall was fairly good, and the recall was impressive. The precision was relatively low compared to the other values. This may be preferable in a spam filter, as the low rate of false negatives leads to the desired result of less spam ending up in an inbox. However, classifying non-spam as spam could lead to frustration. I do not think that the features are truly independent. I think that there are groupings of these features where inclusion or exclusion of some small number of features in the group determines whether or not an email is spam. The classification would be more successful if it had the capacity to account for these cases. The classifier does better than chance, but should be improved.

The data set that this classifier was trained on was fairly limited in its scope and the number of features per email. It does not seem extensible to other contexts, like other workplaces or groups of students.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
confusion.png		confusion.png
spam_bayes.py		spam_bayes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Classifier

About

Releases

Packages

Languages

License

IntoTheVortex/Bayesian-Classifier

Folders and files

Latest commit

History

Repository files navigation

Bayesian Classifier

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages