Challenges in Accessing Information in Digitized 19th-Century Czech Texts

Publication at Faculty of Arts |

2012

Abstract

This short paper describes problems arising in optical character recognition of and information retrieval from historical texts in languages with rich morphology, rather discontinuous lexical development and a long history of spelling reforms. In a work-inprogress manner, the problems and proposed linguistic solutions are shown on the example of the current project focused on improving the access to digitized Czech prints from the 19th century and the first half of the 20th century.

Keywords

Challenges Accessing Information Digitized 19th-Century Czech Texts