UMC005: English-Urdu Parallel Corpus

Publication

Abstract

English-Urdu Parallel Corpus serves training of statistical machine translation between these two languages. It consists of four parts:

1. English-Urdu part of the EMILLE corpus;

2. texts from the Wall Street Journal (Penn Treebank);

3. translations of the Quran;

4. translations of the Bible. Parallel data that existed before (EMILLE) have been completely and newly manually cleaned, corrected alignment and many sentences on the Urdu side.

Keywords

umc005 english urdu parallel corpus

UMC005: English-Urdu Parallel Corpus

Abstract

Keywords

People