Skip to content

Fatihkeskin/Authorship-Dedection-with-Statictical-Hidden-Markov-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

The aim of this program is to determine the Authors of given essays with the language models that have created. First, the program creates some Language Models from given Corpus (Federalist Papers) then we will use this models to perform tasks like text generation and authorship detection. Creating the Language Models well is to key to achieve this problems’ purpose. Because not tokenizing the end-of-sentence characters, not deleting the extreme characters(* ] - ) cause the probabilities lowered. For the both tasks i came into some results, such as: The independency of words can be seen for every random generated texts from unigrams. The dual relationship can be noticeable for every generated texts from bigrams. I can tell the same for the triples for the generated texts from trigram.

In general, i tried to catch the characteristics (writing style, frequently used words, word order) for the both authors by applying language models. As expected, the essay probabilities are gone higher by the change of n-grams. (unigram has less probable compared to trigram) I think it was a fun work and i enjoyed the proccess.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages