InterCorp release 13ud contains the same texts as InterCorp release 13. However, they differ significantly in linguistic annotation.
In 13ud, out of the total number of 41 languages (including Czech), 36 are annotated uniformly according to the Universal Dependencies (UD) standard using the UDPipe tool (see https://universaldependencies.org and https://ufal.mff.cuni.cz/udpipe). The uniform linguistic annotation concerns the method of tokenization, word classes, morphological categories, syntactic structure and syntactic functions.
The use of the corpus in the KonText search engine was facilitated by adding attributes for orientation in the syntactic structure, expanding the list of directly available attributes by frequently used categories, encoding forms composed of two or three syntactic words into split tokens and implementing a helper for specifying queries for word types and categories according to UD.