Charles Explorer logo
🇬🇧

The InterCorp corpus, release 13

Publication

Abstract

A new version of a large parallel corpus containing translations between a total of 41 languages (including Czech). Compared to version 12, the number of words in foreign texts increased to 1,550 million, including 327 million in the fiction core and 1,223 million in freely available collections.

The total number of words in Czech texts is 203 million, including 113 million in the core and 90 million in the collections. Chinese texts were added to the core which contains fiction.

Slovenian is newly tagged by ReLDI tagger.