Morphological homonymy in Czech involves almost 60 % of nouns, the most spread case syn-cretism is nominative-accusative. This has an impact on many linguistics tasks, and one among them is a morphological annotation of texts used for a corpus creation.
In the case of Czech corpus in the CNC project, both manual and automatic (statistical) disambiguation are used. Nowadays, there are only a few handwritten rules which solve nominative-accusative syncretism, and stochastic approach has to been used instead.
With the Czech free word order, it's one of the problematic parts of morphological annotation. Light verbs constructions are analytical predicates where the verb is semantically empty or impoverished, and the noun (often deverbal) is showing both nominal and verbal properties and the meaning.
They work together almost as a phraseme, but with exceptions: meaning is usually clearer, verbal part is variable, nominal part is possible to pronominalize. Phraseme or fast structure of light verbs constructions means that they can be used for determining if the noun is in accusative or nominative case.
If there are two ambiguous (nominative or accusative) nouns in one sentence, and we know that one of them is part of light verb construction with the verb used in the given sentence, we can determine that the noun will be in the case of the verb's complement (in this case accusative) with higher probability of success than a statistical tagger. This work in progress contribution will present rule-based morphological disambiguation in the accusative/nominative case with light verb construction in Czech.
The list of the light verbs constructions used in this study is obtained from the previous studies, mainly from Radimský.