Charles Explorer logo
🇨🇿

Atypical or underrepresented? A pilot study on small treebanks

Publikace

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

We illustrate an approach for multilingual treebanks explorations by introducing a novel adaptation to small treebanks of a methodology for identifying cross-lingual quantitative trends in the distribution of dependency relations. By relying on the principles of cross-validation, we reduce the amount of data required to execute the method, paving the way to expanding its use to low-resources languages.

We validated the approach on 8 small treebanks, each containing less than 100,000 tokens and representing typologically different languages. We also show preliminary but promising evidence on the use of the proposed methodology for treebank expansion.