Charles Explorer logo
🇨🇿

Improving Dependency Parsing by Filtering Linguistic Noise

Publikace na Filozofická fakulta |
2013

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

In this paper, we describe a way to improve stochastic dependency parsing by simplifying both the training data and new text to be parsed. Many parsing errors are due to limited size of the training data, where most of the words of a given language occur seldom or not at all, thus the parser cannot learn their syntactic properties.

By defining narrow classes of words with identical syntactic properties and replacing members of these classes by one representative, we facilitate language modeling done by the parser and improve its accuracy. In our experiment, a 17.8%decrease in forms variability in the training data of the Czech dependency treebank PDT led to a 8.1% relative error reduction.