Charles Explorer logo
🇬🇧

Information Retrieval Test Collection for Searching Spontaneous Czech Speech

Publication at Faculty of Mathematics and Physics |
2007

Abstract

This paper describes the design of the first large-scale IR test collection built for the Czech language. This collection also happens to be very challenging, as it is based on a continuous text stream from automatic transcription of spontaneous speech and thus lacks clearly defined document boundaries.

All aspects of the collection building are presented, together with some initial experiments.