ViterbiAlgorithm

A first-order HMM (Hidden Markov Model) for part of speech tagging (POS) developed in python. This includes;

counting occurrences of one part of speech following another in a training corpus,
counting occurrences of words together with parts of speech in a training corpus,
relative frequency estimation with smoothing,
finding the best sequence of parts of speech for a list of words in the test corpus, according to an HMM model with smoothed probabilities,
computing the accuracy, that is, the percentage of parts of speech that is guessed correctly.

For running;

run the HMM.py to get the accuracy of the viterbi and the greedy best path algotithm
run the 'language_comparison.py' to obtain the results of language comparisons.
Make sure the 5 UD tree banks are available.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
UD_Arabic-PADT		UD_Arabic-PADT
UD_Dutch-Alpino		UD_Dutch-Alpino
UD_English-EWT		UD_English-EWT
UD_French-Sequoia		UD_French-Sequoia
UD_Spanish-GSD		UD_Spanish-GSD
.gitignore		.gitignore
HMM.py		HMM.py
P1getstarted.py		P1getstarted.py
README.md		README.md
corpora.zip		corpora.zip
language_comparison.py		language_comparison.py
smoothing.py		smoothing.py

Provide feedback