This paper analyzes the frequency of six punctuation marks (the comma, period, colon, semicolon, question mark and exclamation mark) in three languages (English, French and Czech) in three different types of corpus - comparable web corpora, large monolingual general (reference) corpora and parallel (translation) corpora. The aim of the analysis is to find out what type of corpus and what methodology are the most suitable for the contrastive research into punctuation.
We argue that, despite their limitations in terms of size and composition as well as potential specific features of the language of translation, parallel corpora, used in combination with the general (reference) corpora, provide the best data for such research. Not only do parallel corpora allow for a direct comparison of the absolute frequencies of punctuation marks in typologically different languages - and as such we do not need to rely on relative frequencies - they also provide data for a more refined qualitative analysis that compares the different uses of punctuation in the languages under study.