Charles Explorer logo
🇬🇧

Length of non-projective sentences: A pilot study using a Czech UD treebank

Publication at Faculty of Arts |
2019

Abstract

Lengths (in words) of projective and non-projective sentences from a Czech UD dependency treebank are compared. It is shown that non-projective sentences are significantly longer (in addition, the same result was obtained in this study also for Arabic, Polish, Russian, and Slovak).

The hyperpascal distribution, which was suggested as the model for frequency distribution of sentence length measured in words, fits well the data from both projective and non-projective sentences; however, its parameters attain different values for the two groups. Proportions of non-projective sentences in the treebanks used are presented, together with a discussion on factors which can influence them.