Charles Explorer logo
🇬🇧

Syntactical transformation of the Czech Academic Corpus

Publication at Faculty of Mathematics and Physics |
2011

Abstract

The idea of the Czech Academic Corus (CAC) came to life in 1971 thanks to the Department of Mathematical Linguistics within the Institute of Czech Language. By the mid 1980s, a total of 540,000 words were morphologically and syntactically manually annotated.

After the Prague Dependency Treebank (PDT) - the largest treebank of Czech written texts - has been built, a conversion from the CAC to the PDT format has started. The main goal was to make the CAC and the PDT compatible thus to enable integration of the CAC into the PDT.

The second version of the CAC presents such a complete conversion of the internal format and the annotation schemes. Conversion of syntactic annotation has started three years after the syntactic annotation of PDT has been finished.

Such a situation is exceptional since, at least to our knowledge, there is no other language for the annotation of indispensable amount of data is being done in two subsequent annotation projects. This article summarizes the experience acquired during