Charles Explorer logo
🇬🇧

Czech Version of the Multi30k dataset

Publication

Abstract

This is the Czech version of the Multi30k dataset that is used for WMT competitions in Multimodal Machine Translation. The dataset is based on the Flickr30k dataset with more 30,000 images accompanied by English captions.

For the WTM16 and WMT17 German and French translation were added to these captions. For the WTM18 competition, we added also the translation into the Czech language.