Full Text:   <1547>

CLC number: TP311

On-line Access: 

Received: 2006-08-22

Revision Accepted: 2006-12-05

Crosschecked: 0000-00-00

Cited: 3

Clicked: 2748

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2007 Vol.8 No.6 P.857~863

http://doi.org/10.1631/jzus.2007.A0857


Indexing the bit-code and distance for fast KNN search in high-dimensional spaces


Author(s):  LIANG Jun-jie, FENG Yu-cai

Affiliation(s):  College of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan 430074, China; more

Corresponding email(s):   ljjhubu@163.com

Key Words:  High-dimensional spaces, KNN search, Bit-code and distance based index (BD), Approximate vector


LIANG Jun-jie, FENG Yu-cai. Indexing the bit-code and distance for fast KNN search in high-dimensional spaces[J]. Journal of Zhejiang University Science A, 2007, 8(6): 857~863.

@article{title="Indexing the bit-code and distance for fast KNN search in high-dimensional spaces",
author="LIANG Jun-jie, FENG Yu-cai",
journal="Journal of Zhejiang University Science A",
volume="8",
number="6",
pages="857~863",
year="2007",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2007.A0857"
}

%0 Journal Article
%T Indexing the bit-code and distance for fast KNN search in high-dimensional spaces
%A LIANG Jun-jie
%A FENG Yu-cai
%J Journal of Zhejiang University SCIENCE A
%V 8
%N 6
%P 857~863
%@ 1673-565X
%D 2007
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2007.A0857

TY - JOUR
T1 - Indexing the bit-code and distance for fast KNN search in high-dimensional spaces
A1 - LIANG Jun-jie
A1 - FENG Yu-cai
J0 - Journal of Zhejiang University Science A
VL - 8
IS - 6
SP - 857
EP - 863
%@ 1673-565X
Y1 - 2007
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2007.A0857


Abstract: 
Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called bit-code and distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD outperforms the existing index structures for KNN search in high-dimensional spaces.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1] Berchtold, S., Bohm, C., Kriegel, H.P., 1998. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.142-153.

[2] Beyer, K., Goldstein, J., Ramakrishnam, R., 1999. When is “Nearest Neighbor” Meaningful? Proc. 7th Int’l Conf. Database Theory, p.1-11.

[3] Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J., 2001. Searching in metric spaces. ACM Computing Surveys, 33(3):273-321.

[4] Cui, B., Shen, H.T., Shen, J., Tan, K.L., 2005. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases. Proc. 16th Australian Database Conference, p.165-174.

[5] Fonseca, M.J., Jorge, J.A., 2003. Indexing High-dimensional Data for Content-based Retrieval in Large Databases. Proc. 8th International Conference on Database Systems for Advanced Applications, p.267-274.

[6] Guha, S., Rastogi, R., Shim, K., 1998. Cure: An Efficient Clustering Algorithm for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.73-84.

[7] Hinneburg, A., Aggarwal, C.C., Keim, D.A., 2000. What is the Nearest Neighbor in High-dimensional Spaces. Proc. 26th Int. Conf. on Very Large Data Bases, p.506-515.

[8] Hjaltson, G.R., Samet, H., 2003. Index-driven similarity search in metric spaces. ACM Trans. on Database Syst., 28(4):517-580.

[9] Jin, H., Ooi, B.C., Shen, H.T., Yu, C., Zhou, A.Y., 2003. An Adaptive and Efficient Dimensionality Reduction Algorithm for High-dimensional Indexing. Proc. Int’l Conf. Data Eng., p.87-98.

[10] Jolliffe, I.T., 1986. Principal Component Analysis. Springer-Verlag, New York.

[11] Weber, R., Schek, H.J., Blott, S., 1998. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-dimensional Spaces. Proc. 24th Int. Conf. on Very Large Data Bases, p.194-205.

[12] Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V., 2001. Indexing the Distance: An Efficient Method to KNN Processing. Proc. 27th Int. Conf. on Very Large Data Bases, p.421-430.

[13] Yu, C., Bressan, S., Ooi, B.C., Tan, K.L., 2004. Querying high-dimensional data in single dimensional space. Int. J. Very Large Data Bases, 13(2):105-119.

[14] Zhang, R., Ooi, B.C., Tan, K.L., 2004. Making the Pyramid Technique Robust to Query Types and Workloads. Proc. Int. Conf. on Data Eng., p.313-324.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE