SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index

Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, Xuemin Lin

Published in PVLDB, 2014

Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called 'curse of dimensionality'. An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high efficiency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability.
In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly flexible and support a variety of functionalities, such as finding the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.

PDF BibTex