GitHub - Fatihkeskin/Authorship-Dedection-with-Statictical-Hidden-Markov-Models

The aim of this program is to determine the Authors of given essays with the language models that have created. First, the program creates some Language Models from given Corpus (Federalist Papers) then we will use this models to perform tasks like text generation and authorship detection. Creating the Language Models well is to key to achieve this problems’ purpose. Because not tokenizing the end-of-sentence characters, not deleting the extreme characters(* ] - ) cause the probabilities lowered. For the both tasks i came into some results, such as: The independency of words can be seen for every random generated texts from unigrams. The dual relationship can be noticeable for every generated texts from bigrams. I can tell the same for the triples for the generated texts from trigram.

In general, i tried to catch the characteristics (writing style, frequently used words, word order) for the both authors by applying language models. As expected, the essay probabilities are gone higher by the change of n-grams. (unigram has less probable compared to trigram) I think it was a fun work and i enjoyed the proccess.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
HMM.py		HMM.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HMM.py

HMM.py

README.md

README.md

Repository files navigation

About

Releases

Packages

Languages

Fatihkeskin/Authorship-Dedection-with-Statictical-Hidden-Markov-Models

Folders and files

Latest commit

History

HMM.py

HMM.py

README.md

README.md

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages