In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be aligned on the word level.
The most common word-alignment tool is GIZA++. It is very universal and language independent.
In this text, we introduce a different approach – the tectogrammatical alignment. It works on content (autosemantic) words only, but on these words it widely outperforms GIZA++.
The GIZA++ word-alignment can be therefore improved using tectogrammatical alignment and if we use this improved alignment for training phrase-based automatic translators, the translation quality also slightly increases.