Syntactic Identification of Occurrences of Multiword Expressions in Text using a Lexicon with Dependency Structures

Publication at Faculty of Mathematics and Physics |

2013

Abstract

We deal with syntactic identification of occurrences of multiword expression (MWE) from an existing dictionary in a text corpus. The MWEs we identify can be of arbitrary length and can be interrupted in the surface sentence.

We analyse and compare three approaches based on linguistic analysis at a varying level, ranging from surface word order to deep syntax. The evaluation is conducted using two corpora: the Prague Dependency Treebank and Czech National Corpus.

We use the dictionary of multiword expressions SemLex, that was compiled by annotating the Prague Dependency Treebank and includes deep syntactic dependency trees of all MWEs.

Keywords

syntactic identification occurrences multiword expressions text using lexicon with dependency structures