Visual Genome is a dataset connecting structured image information with English language. We present "Hindi Visual Genome", a multi-modal dataset consisting of text and images suitable for English-Hindi multi-modal machine translation task and multi-modal research.
We have selected short English segments (captions) from Visual Genome along with the associated images and automatically translated them to Hindi. A careful manual post-editing followed which took the associated images into account.