Charles Explorer logo

Skladnica: a constituency treebank of Polish harmonised with the Walenty valency dictionary



This paper reports on the developments in three interrelated linguistic resources for Polish. The first is swigra 2-a rule based constituency parser for Polish.

The second is Skladnica-a treebank built using swigra 2. The third resource is valency dictionary Walenty, which became available when the work on the first two was already advanced.

However, since the dictionary is much more comprehensive than the ad-hoc dictionary used previously with swigra, a decision was made to switch the parser and the treebank to the new dictionary. The switch required several modifications to the swigra 2 parser, including implementation of unlike coordination, introducing semantically motivated phrases, and non-standard case values.

A semi-automated procedure to upgrade previously disambiguated trees in Skladnica was required as well. Modifications introduced in the treebank during the upgrade included systematic changes of notation and resolving newly introduced ambiguities resulting from the use of the more detailed distinctions made in the dictionary.

The procedure for confronting Skladnica with the trees generated with the new version of the swigra 2 parser using the Walenty dictionary allowed us to check all of these resources for consistency. This resulted in several corrections being introduced in both the treebank and the valency dictionary.