Top frequency words in written/spoken Czech and English - can they be used to measure a rate of diglossia?

Publication at Faculty of Arts |

2019

Abstract

The aim of our paper is to investigate the extent to which diglossia is typical of the Czech language by using a corpus analysis of the most frequent words in both written and spoken Czech. We compare these with similar data for English in order to find out how English and Czech differ, or are similar to each other in their lexical core, especially regarding the overlap of written and spoken language, which could be one of the indicators of diglossia.

Our research was inspired by the Longman Dictionary of Contemporary English (LDoCE5 2009), where a simple code consisting of the letter W[ritten] / S[poken] and a number 1-3 indicates the 3,000 most frequent lemmas in written and spoken language.

Keywords

diglossia lexicon spoken language written language language corpus