Building a learner corpus

Publication at Faculty of Mathematics and Physics, Faculty of Arts |

2014

Abstract

The need for data about the acquisition of Czech by non-native learners prompted the compilation of the first learner corpus of Czech. After introducing its basic design and parameters, including a multi-tier manual annotation scheme and error taxonomy, we focus on the more technical aspects: transcription of hand-written source texts, process of annotation, and options for exploiting the result, together with tools used for these tasks and decisions behind the choices.

To support or even substitute manual annotation we assign some error tags automatically and use automatic annotation tools (tagger, spell checker).

Keywords

building learner corpus