The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units that may be subsequently combined into clauses and thus provide a structure of a complex sentence with regard to the mutual relationship of individual clauses. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom.
The paper introduces important terms, describes a segmentation chart, the data structure used for the description of mutual relationship between individual segments and separators. It also contains a simple set of rules applied for the segmentation of a small set of Czech sentences.
The segmentation results are evaluated against a small hand-annotated corpus of Czech complex sentences.