Charles Explorer logo
🇬🇧

CUNI-Malta system at CoNLL-SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation

Publication at Faculty of Mathematics and Physics |
2019

Abstract

This paper presents the submission by the Charles University-University of Malta team to the CoNLL--SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context. We present a lemmatization model based on previous work on neural transducers \cite{makarov2018neural,aharoni-goldberg-2017-morphological}.

The key difference is that our model transform the whole word form in every stem, instead of consuming it character by character. We propose a merging strategy inspired by Byte-Pair-Encoding that reduces the space of valid operations by merging frequent adjacent operations.

The resulting operations not only encode the action/s to be performed but the relative position in the word token and how characters need to be transformed. Our morphological tagger is a vanilla biLSTM tagger that operates over operation representations, encoding operations and words in a hierarchical manner.

Even though relative performance according to metrics is below the baseline, experiments show that our mod