Charles Explorer logo
🇬🇧

The SYN concept: towards one-billion corpus of Czech

Publication at Faculty of Arts |
2009

Abstract

The paper describes corpus SYN, a unification of synchronic written corpora of Czech consistently re-processed with state-of-the-art versions of available tools. After inclusion of newspaper corpus SYN2009PUB, its size will reach 1.2 billion tokens.