In this paper, we explore the linguistic factors that influence an author's choice of discourse connectives in the production of a coherent text. We focus on the competition between so-called primary connectives (grammaticalized and mostly one-word expressions such as therefore) and secondary connectives (not yet fully grammaticalized compositional discourse phrases such as for this reason).
We attempt to describe the linguistic constraints on and preferences in connective selection. The analysis is based on manually annotated data from the Prague Discourse Treebank 2.0 (PDiT), which contains almost 50000 sentences from Czech newspaper texts.
We demonstrate that discourse connectives are used in accordance with the economy principle in language, i.e. authors aim to achieve the maximal result with minimal effort. They most frequently choose short and semantically more generalized primary connectives.
However, in cases where the discourse relations can be misunderstood, authors prefer more complex and s