Improving Performance and Accuracy of Local PCA

Publikace na Matematicko-fyzikální fakulta |

2011

Abstrakt

Local Principal Component Analysis (LPCA) is one of the popular techniques for dimensionality reduction and data compression of large data sets encountered in computer graphics. The LPCA algorithm is a variant of k-means clustering where the repetitive classification of high dimensional data points to their nearest cluster leads to long execution times.

The focus of this paper is on improving the efficiency and accuracy of LPCA. We propose a novel SortCluster LPCA algorithm that significantly reduces the cost of the point-cluster classification stage, achieving a speed-up of up to 20.

To improve the approximation accuracy, we adopt the k-means++ algorithm [AV07]. We show that similar ideas that lead to the efficiency of our SortCluster LPCA algorithm can be used to accelerate k-means++.

The resulting initialization algorithm is faster than purely random seeding while producing substantially more accurate data approximation.

Klíčová slova

Local principal component analysis Local PCA k-means clustering