Charles Explorer logo
🇬🇧

Morphological Tagging Based on Averaged Perceptron

Publication at Faculty of Mathematics and Physics |
2006

Abstract

Czech (like other Slavic languages) is well known for its complex morphology. Text processing (e.g., automatic translation, syntactic analysis...) usually requires unambiguous selection of grammatical categories (so called morphological tag) for every word in a text.

Morphological tagging consists of two parts – assigning all possible tags to every word in a text and selecting the right tag in a given context. Project Morče attempts to solve the second part, usually called disambiguation.

Using a statistical method based on the combination of a Hidden Markov Model and the AveragedAveraged Perceptron algorithm, a number of experiments have been made exploring different parameter settings of the algorithm in order to obtain the best success rate possible. Final accuracy of Morče on data from PDT 2.0 was 95.431% (results of March 2006).

So far, it is the best result for a standalone tagger.