English-Urdu Parallel Corpus serves training of statistical machine translation between these two languages. It consists of four parts:
1. English-Urdu part of the EMILLE corpus;
2. texts from the Wall Street Journal (Penn Treebank);
3. translations of the Quran;
4. translations of the Bible. Parallel data that existed before (EMILLE) have been completely and newly manually cleaned, corrected alignment and many sentences on the Urdu side.