Charles Explorer logo
🇬🇧

Reference Data for Czech Collocation Extraction

Publication at Faculty of Mathematics and Physics |
2008

Abstract

We introduce three reference data sets provided for the MWE 2008 evaluation campaign focused on ranking MWE candidates. The data sets comprise bigrams extracted from the Prague Dependency Treebank and the Czech National Corpus.

The extracted bigrams are annotated as collocational and non-collocational and provided with corpus frequency information.