Improving SMT by Using Parallel Data of a Closely Related Language

Publication at Faculty of Mathematics and Physics |

2012

Abstract

The amount of training data in statistical machine translation is critical for translation quality. In this paper, we demonstrate how to increase translation quality for one language pair by bringing in parallel data from a closely related language.

In particular, we improve en→sk translation using a large Czech–English parallel corpus and a shallow (rule-based) MT system for cs→sk. Several setup options are explored in order to identify the best possible configuration.

Keywords

improving using parallel data closely related language