Estimating senses with sets of lexically related words for Polish word sense disambiguation

Publikace

Abstrakt

We propose a new algorithm for word sense disambiguation, exploiting data from a WordNet with many types of lexical relations, such as plWordNet for Polish. In this method, sense probabilities in context are approximated with a language model.

To estimate the likelihood of a sense appearing amidst the word sequence, the token being disambiguated is substituted with words related lexically to the given sense or words appearing in its WordNet gloss. We test this approach on a set of sense-annotated Polish sentences with a number of neural language models.

Our best setup achieves the accuracy score of 55.12% (72.02% when first senses are excluded), up from 51.77% of an existing PageRank-based method. While not exceeding the first (often meaning most frequent) sense baseline in the standard case, this encourages further research on combining WordNet data with neural models.

Klíčová slova

semantic disambiguation WordNet Polish