Combining Manual and Automatic Annotation of a Learner Corpus

Publication at Faculty of Mathematics and Physics, Faculty of Arts |

2012

Abstract

We present an approach to building a learner corpus of Czech, manually corrected and annotated with error tags using a complex grammar-based taxonomy of errors in spelling, morphology, morphosyntax, lexicon and style. This grammar-based annotation is supplemented by a formal classification of errors based on surface alternations.

To supply additional information about non-standard or ill-formed expressions, we aim at a synergy of manual and automatic annotation, deriving information from the original input and from the manual annotation.

Keywords

combining manual automatic annotation learner corpus