We have generalized a method for tandem mass spectra interpretation, based on the parameterized Hausdorff distance. Instead of just peptides (short pieces of proteins), in this paper we describe the interpretation of whole protein sequences.
For this purpose, we employ the recently introduced NM-tree to index the database of hypothetical mass spectra for exact or fast approximate search. The NM-tree combines the M-tree with the TriGen algorithm in a way that allows to dynamically control the retrieval precision at query time.
A scheme for protein sequences identification using the NM-tree is proposed.