CLC number: TP311
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 0000-00-00
Cited: 3
Clicked: 5371
LIANG Jun-jie, FENG Yu-cai. Indexing the bit-code and distance for fast KNN search in high-dimensional spaces[J]. Journal of Zhejiang University Science A, 2007, 8(6): 857-863.
@article{title="Indexing the bit-code and distance for fast KNN search in high-dimensional spaces",
author="LIANG Jun-jie, FENG Yu-cai",
journal="Journal of Zhejiang University Science A",
volume="8",
number="6",
pages="857-863",
year="2007",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2007.A0857"
}
%0 Journal Article
%T Indexing the bit-code and distance for fast KNN search in high-dimensional spaces
%A LIANG Jun-jie
%A FENG Yu-cai
%J Journal of Zhejiang University SCIENCE A
%V 8
%N 6
%P 857-863
%@ 1673-565X
%D 2007
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2007.A0857
TY - JOUR
T1 - Indexing the bit-code and distance for fast KNN search in high-dimensional spaces
A1 - LIANG Jun-jie
A1 - FENG Yu-cai
J0 - Journal of Zhejiang University Science A
VL - 8
IS - 6
SP - 857
EP - 863
%@ 1673-565X
Y1 - 2007
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2007.A0857
Abstract: Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called bit-code and distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD outperforms the existing index structures for KNN search in high-dimensional spaces.
[1] Berchtold, S., Bohm, C., Kriegel, H.P., 1998. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.142-153.
[2] Beyer, K., Goldstein, J., Ramakrishnam, R., 1999. When is “Nearest Neighbor” Meaningful? Proc. 7th Int’l Conf. Database Theory, p.1-11.
[3] Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J., 2001. Searching in metric spaces. ACM Computing Surveys, 33(3):273-321.
[4] Cui, B., Shen, H.T., Shen, J., Tan, K.L., 2005. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases. Proc. 16th Australian Database Conference, p.165-174.
[5] Fonseca, M.J., Jorge, J.A., 2003. Indexing High-dimensional Data for Content-based Retrieval in Large Databases. Proc. 8th International Conference on Database Systems for Advanced Applications, p.267-274.
[6] Guha, S., Rastogi, R., Shim, K., 1998. Cure: An Efficient Clustering Algorithm for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.73-84.
[7] Hinneburg, A., Aggarwal, C.C., Keim, D.A., 2000. What is the Nearest Neighbor in High-dimensional Spaces. Proc. 26th Int. Conf. on Very Large Data Bases, p.506-515.
[8] Hjaltson, G.R., Samet, H., 2003. Index-driven similarity search in metric spaces. ACM Trans. on Database Syst., 28(4):517-580.
[9] Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.Y., 2003. An Adaptive and Efficient Dimensionality Reduction Algorithm for High-dimensional Indexing. Proc. Int’l Conf. Data Eng., p.87-98.
[10] Jolliffe, I.T., 1986. Principal Component Analysis. Springer-Verlag, New York.
[11] Weber, R., Schek, H.J., Blott, S., 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-dimensional Spaces. Proc. 24th Int. Conf. on Very Large Data Bases, p.194-205.
[12] Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V., 2001. Indexing the Distance: An Efficient Method to KNN Processing. Proc. 27th Int. Conf. on Very Large Data Bases, p.421-430.
[13] Yu, C., Bressan, S., Ooi, B.C., Tan, K.L., 2004. Querying high-dimensional data in single dimensional space. Int. J. Very Large Data Bases, 13(2):105-119.
[14] Zhang, R., Ooi, B.C., Tan, K.L., 2004. Making the Pyramid Technique Robust to Query Types and Workloads. Proc. Int. Conf. on Data Eng., p.313-324.
Open peer comments: Debate/Discuss/Question/Opinion
<1>