The article presents the recently completed Czech subcorpus of the multinational learner corpus of advanced spoken English LINDSEI and aims to draw attention to some of the methodological concerns the field of learner corpus linguistics faces. First, it describes the Louvain family of learner corpora, where this project originated, and provides a detailed description of LINDSEI, its history, design, structure, transcription system and metadata.
It then outlines the nature of the Czech subcorpus LINDSEI_CZ, telling the story of its compilation and providing a quantitative description of the corpus size, task sizes and learner variables, as well as a description of the transcription process. The core part of this text discusses methodological concerns affecting learner corpus design and construction and deals with such issues as task design, recording instructions, the matter of learner-participant proficiency, and transcription system employed.
It concludes with a consideration of various methodological suggestions and offers the possible view that, despite certain weaknesses, LINDSEI is an invaluable source of highly authentic learner data. The last section provides a thematic categorisation of existing studies on LINDSEI and concludes with descriptions of some future projects.
The article calls for a thorough reconsideration of learner corpus design and practice and for the formulation of compilation and research standards which would lead to an increase in the reliability and exploitation potential of learner corpora.