There are words written in several different ways in Czech, e.g., lampion TILDE OPERATOR+D91 lampi ón (lampion). This variability may occur in either some inflectional word- forms (inflectional variants), cf. hradu TILDE OPERATOR+D91 hradě in the locative case of the noun hrad (castle), or across the inflectional wordforms and derivatives (global variants), cf. fantazijní TILDE OPERATOR+D91 fantasijní in the adjective derived from the noun fantazie TILDE OPERATOR+D91 fantasie (fantasy).
It is reasonable to distinguish the global variants as different words but to have formal means that interconnect them in the Natural Language Processing systems and resources. In this paper, we describe the identification of global variants in the Czech vocabulary and summarise new changes in the MorfFlex CZ dictionary and DeriNet lexicon concerning this type of variants.
We reviewed several typical patterns within global variants captured in the available resources and combined a set of regular expressions with manual annotations to achieve the highest precision of the identification.