Charles Explorer logo

Expanding LINDSEI to spoken learner English from several L1s across CEFR levels

Publication at Faculty of Arts |


Learner corpus studies typically investigate the language of second-language learners with a different first language (L1) or with proficiency levels inferred from external criteria (e.g., the Louvain International Database of Spoken English Interlanguage, LINDSEI; Gilquin et al., 2010). This paper reports the process of expanding the original Czech (Gráf, 2017) and Taiwanese (Huang, 2014) sub-corpora (predominantly at B2 and C1; Huang et al., 2018) with samples from learners of other L1s across CEFR levels.

In addition to sixty interviews by the German, Finnish and Norwegian LINDSEI teams, another eighty-three interviews with university students in Taiwan and Finland were held. The data collection and transcription procedures were adapted from LINDSEI guidelines to ensure comparability.

Each fourteen-minute interview was anonymised using Audacity, and orthographically transcribed and aligned by means of EXMARaLDA. The levels of speaking proficiency in the supplemented data were assessed by two expert raters.

The expanded learner corpus, containing 243 interviews, will be of considerable value for studying the development of learner English.