Exploring a complete phraseology of one author is a very difficult task. Thanks to a unique project of building author corpora within the Czech National Corpus, such research is now possible for two important Czech writers - Karel Čapek and Bohumil Hrabal.
Corpus of Čapek writings was created in 2007 and includes almost three million words, the Hrabal corpus from 2009 includes more than two million words. All published texts of the two authors are included in the corpora.
The tool for semi-automatic recognition of phraseology, which is based on recently (re)printed dictionary of Czech phraseology and idiomatics, allows a relatively complete detection of different kinds of phrasemes (nominal and verbal phrasemes, similes etc.) occuring in a corpus, whether it is a one-author or multiple author corpus. This article is focused on a comparison of Čapek and Hrabal, two authors with different literary styles who lived and wrote in different periods of the 20th century, and on subsequent comparison with a balanced corpus SYN2010 which includes journalistic, academic and literary texts (one 100 million words).
We pay special attention to verbal phrasems containing the verb to have (mít) and nominal phrasems relating to parts of human body such as head, hand and eye (hlava, ruka, oko).