Universal Dependencies for the Slovakian language: upgrading the guidelines, learning data and analysis model

Publikace

Abstrakt

Universal Dependencies (UD) is an internationally harmonized marking scheme for cross-linguistically comparable morphological and syntactic marking of texts according to the principles of dependency grammar, which has been successfully used for more than 130 other world languages Marking of texts in Slovenian. In this paper, we present the results of recent activities in connection with the UD scheme within the project The development of the Slovenian language in the digital environment, within the framework of which we upgraded the existing infrastructure with renovation and detailed documentation marking guidelines of the UD for Slovene, expansion of the SSJ-UD nursery for written Slovene with new sentences from the ssj500k corpora and ELEXIS-WSD and creation of a new machine model of syntactic parsing in the markup tool CLASSLA-Stanza.

In support for further applications in various fields of machine processing of Slovenian language, we will evaluate the new model in more detail, namely, in addition to general evaluation of parsing accuracy, we also report on accuracy at the level of individual syntactic relations and on the most frequent types of errors.

Klíčová slova

Universal Dependencies Slovenian language analysis