Building a corpus of aphasic Czech

Publication at Faculty of Arts |

2015

Abstract

Introduction: The use of large corpora has become a standard practice in general linguistics in last thirty years. While such large samples of language usage illustrate well the lexico-grammatical patterns of a language, there are also specialized corpora which are typically smaller and serve as samples of special cases of language usage.

There are only few corpora of aphasic speech (e.g. MacWhinney, Fromm, Forbes, & Holland, 2011).

Objectives: The project builds on the corpus linguistic resources available for Czech (e.g. Czech National Corpus, 2008) and aims to provide a source of aphasic speech data for Czech, which is highly understudied in this respect (e.g.

Lehečková, 2001). Methods: Recordings of persons with aphasia were collected over two sessions, a task-based one aimed to elicit monological discourse, and a semi-structured interview.

The recordings were transcribed, lemmatized, and part-of-speech tagged. Where possible, error annotation was also included.

The transcripts were also paired with corresponding audio tracks. Results and conclusions: In the initial phase of the project, 10 hours of aphasic speech were collected, transcribed, and annotated.

The corpus is prepared for publication and will be available to researchers, and therapists, providing a much-needed tool for the study of aphasia in Czech. The project also created a standard for other researchers willing to share their data.

Keywords

specialized corpora aphasia discourse production in aphasia