Charles Explorer logo
🇬🇧

The InterCorp corpus, release 15

Publication

Abstract

A new version of a large parallel corpus containing translations between a total of 42 languages (including Czech). The number of words in foreign texts increased to 1 588 million, including 362 million in the fiction core and 1 226 million in freely available collections.

The total number of words in Czech texts is 210 million, including 120 million in the core and 90 million in the collections. The Project Syndicate collection now includes additional texts from 2019–2021, including texts in Chinese and Arabic.