Contribution Towards a Corpus-Based Phraseology Minimum

Publication at Faculty of Arts |

2017

Abstract

This paper represents an attempt to put together a list of the most commonly used (most typical) Czech idioms using corpus data with annotated collocations. Collocations are annotated in corpora of contemporary written Czech as well as in a corpus of spoken Czech containing transcripts of intimate conversations.

Idioms are selected based on their frequency in different text types (newspapers and magazines, non-fiction, fiction, spoken language) and the resulting list is compiled based on a criterion of occurrence of the given idiom in at least two different text types. A short characteristic of the individual text types is given in terms of which types of idioms are typical for them (according to formal criteria).

This study confirms a substantial divide between idiom use in written and spoken language. A smaller difference can be observed between fiction on the one hand and non-fiction and newspapers on the other.

The main reason for this is the interactive nature of fiction texts, which leads to them containing idioms with verbal components. These are employed in a fashion similar to spoken languages, in interactions among the individual characters.

By contrast, non-fiction and journalistic language tends to be more descriptive, with more nominal idioms.

Keywords

Corpus-Based Phraseology Idiom, Phraseology Minimum