Charles Explorer logo

A New State-of-The-Art Czech Named Entity Recognizer

Publication at Faculty of Mathematics and Physics |


We present a new named entity recognizer for the Czech language. It reaches 82.82 F-measure on the Czech Named Entity Corpus 1.0 and significantly outperforms previously published Czech named entity recognizers.

On the English CoNLL-2003 shared task, we achieved 89.16 F-measure, reaching comparable results to the English state of the art. The recognizer is based on Maximum Entropy Markov Model and a Viterbi algorithm decodes an optimal sequence labeling using probabilities estimated by a maximum entropy classifier.

The classification features utilize morphological analysis, two-stage prediction, word clustering and gazetteers.