Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 853 Bytes

README.md

File metadata and controls

3 lines (2 loc) · 853 Bytes

AuthorIdentification

Authorship Identification determines the likelihood of a piece of writing to be produced by a particular author by examining other writings by that author. It is a classification problem whose complexity level can be determined by several parameters such as the kind of feature set used, the size of the training data, the number of authors considered, the number of writings per author, the type of classification model used etc. This project was developed to perform the task of classifying authors of online messages taken from Reddit and Enron Email Dataset. A 2-way SVM classifier was developed which achieved an accuracy of 83.5% on the Enron Email Dataset and an accuracy of 74% on the Reddit Dataset. The classification parameters were then altered to compare the effect of these parameters on classification accuracies.