Charles Explorer logo
🇬🇧

An error tagged corpus of Czech as a second language - recapitulation and perspectives

Publication at Faculty of Arts |
2016

Abstract

Texts produced by non-native speakers are a precious source of information about the acquisition of a language by the learners and about second language acquisition in general. Collections of such texts - learner corpora - can be annotated in a way similar to other corpora with morphosyntactic categories or syntactic structure.

However, their most interesting aspect is existence of the deviant language use, which can be identified, corrected and assigned a tag specifying the type of error. Annotation of this kind is a challenging task, even more so for Slavic languages, with its rich inflection, derivation, agreement, and a largely information-structure-driven constituent order.