Using Tectogrammatical Alignment in Phrase-Based Machine Translation

Publication at Faculty of Mathematics and Physics |

2009

Abstract

In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be aligned on the word level.

The most common word-alignment tool is GIZA++. It is very universal and language independent.

In this text, we introduce a different approach – the tectogrammatical alignment. It works on content (autosemantic) words only, but on these words it widely outperforms GIZA++.

The GIZA++ word-alignment can be therefore improved using tectogrammatical alignment and if we use this improved alignment for training phrase-based automatic translators, the translation quality also slightly increases.

Keywords

using tectogrammatical alignment phrase based machine translation