Charles Explorer logo
🇨🇿

Modified LSI Model for Efficient Search by Metric Access Methods

Publikace na Matematicko-fyzikální fakulta, Ústřední knihovna |
2005

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

Text collections represented in LSI model are hard to search efficiently (i.e. quickly), since there exists no indexing method for the LSI matrices. The inverted file, often used in both boolean and classic vector model, cannot be effectively utilized, because query vectors in LSI model are dense.

A possible way for efficient search in LSI matrices could be the usage of metric access methods (MAMs). Instead of cosine measure, the MAMs can utilize the deviation metric for query processing as an equivalent dissimilarity measure.

However, the intrinsic dimensionality of collections represented by LSI matrices is often large, which decreases MAMs' performance in searching. In this paper we introduce $\sigma$-LSI, a modification of LSI in which we artificially decrease the intrinsic dimensionality of LSI matrices.

This is achieved by an adjustment of singular values produced by SVD. We show that suitable adjustments could dramatically improve the efficiency when searching by MAMs,...