A first-order HMM (Hidden Markov Model) for part of speech tagging (POS) developed in python. This includes;
- counting occurrences of one part of speech following another in a training corpus,
- counting occurrences of words together with parts of speech in a training corpus,
- relative frequency estimation with smoothing,
- finding the best sequence of parts of speech for a list of words in the test corpus, according to an HMM model with smoothed probabilities,
- computing the accuracy, that is, the percentage of parts of speech that is guessed correctly.
For running;
- run the HMM.py to get the accuracy of the viterbi and the greedy best path algotithm
- run the 'language_comparison.py' to obtain the results of language comparisons.
- Make sure the 5 UD tree banks are available.