In this paper, we describe differences between a classical word alignment on the surface (word-layer alignment) and an alignment of deep syntactic sentence representations (tectogrammatical alignment). The deep structures we use are dependency trees containing content (autosemantic) words as their nodes.
Most of other functional words, such as prepositions, articles, and auxiliary verbs are hidden. We introduce an algorithm which aligns such trees using perceptron-based scoring function.
For evaluation purposes, a set of parallel sentences was manually aligned. We show that using statistical word alignment (GIZA ) can improve the tectogrammatical alignment.
Surprisingly, we also show that the tectogrammatical alignment can be then used to significantly improve the original word alignment.