AuthorIdentification

Authorship Identification determines the likelihood of a piece of writing to be produced by a particular author by examining other writings by that author. It is a classification problem whose complexity level can be determined by several parameters such as the kind of feature set used, the size of the training data, the number of authors considered, the number of writings per author, the type of classification model used etc. This project was developed to perform the task of classifying authors of online messages taken from Reddit and Enron Email Dataset. A 2-way SVM classifier was developed which achieved an accuracy of 83.5% on the Enron Email Dataset and an accuracy of 74% on the Reddit Dataset. The classification parameters were then altered to compare the effect of these parameters on classification accuracies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AuthorIdentification

Files

README.md

Latest commit

History

README.md

File metadata and controls

AuthorIdentification