Full Text:   <1895>

CLC number: TP301.6

On-line Access: 2012-10-01

Received: 2012-03-22

Revision Accepted: 2012-07-23

Crosschecked: 2012-09-11

Cited: 4

Clicked: 2612

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE C 2012 Vol.13 No.10 P.761-768

http://doi.org/10.1631/jzus.C1200078


An accelerated K-means clustering algorithm using selection and erasure rules


Author(s):  Suiang-Shyan Lee, Ja-Chen Lin

Affiliation(s):  Department of Computer Science, National Chiao Tung University, Taiwan 30050, Hsinchu

Corresponding email(s):   ignoreswing.cs98g@g2.nctu.edu.tw, jclin@cs.nctu.edu.tw

Key Words:  K-means clustering, Acceleration, Vector quantization, Selection, Erasure


Suiang-Shyan Lee, Ja-Chen Lin. An accelerated K-means clustering algorithm using selection and erasure rules[J]. Journal of Zhejiang University Science C, 2012, 13(10): 761-768.

@article{title="An accelerated K-means clustering algorithm using selection and erasure rules",
author="Suiang-Shyan Lee, Ja-Chen Lin",
journal="Journal of Zhejiang University Science C",
volume="13",
number="10",
pages="761-768",
year="2012",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1200078"
}

%0 Journal Article
%T An accelerated K-means clustering algorithm using selection and erasure rules
%A Suiang-Shyan Lee
%A Ja-Chen Lin
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 10
%P 761-768
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1200078

TY - JOUR
T1 - An accelerated K-means clustering algorithm using selection and erasure rules
A1 - Suiang-Shyan Lee
A1 - Ja-Chen Lin
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 10
SP - 761
EP - 768
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1200078


Abstract: 
The K-means method is a well-known clustering algorithm with an extensive range of applications, such as biological classification, disease analysis, data mining, and image compression. However, the plain K-means method is not fast when the number of clusters or the number of data points becomes large. A modified K-means algorithm was presented by Fahim et al. (2006). The modified algorithm produced clusters whose mean square error was very similar to that of the plain K-means, but the execution time was shorter. In this study, we try to further increase its speed. There are two rules in our method: a selection rule, used to acquire a good candidate as the initial center to be checked, and an erasure rule, used to delete one or many unqualified centers each time a specified condition is satisfied. Our clustering results are identical to those of Fahim et al. (2006). However, our method further cuts computation time when the number of clusters increases. The mathematical reasoning used in our design is included.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Chen, L.S.T., Su, W.K., Lin, J.C., 2009. Secret image sharing based on vector quantization. Int. J. Circ. Syst. Signal Process., 3(3):137-144.

[2]Crespo, F., Weber, R., 2005. A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets Syst., 150(2):267-284.

[3]Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A., 2006. An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ.-Sci. A, 7(10):1626-1633.

[4]Frank, A., Asuncion, A., 2010. UCI Machine Learning Repository. Schools of Information and Computer Science, University of California, Irvine, CA. Available from http://archive.ics.uci.edu/ml [Accessed on July 19, 2012].

[5]Kong, W.Z., Zhu, S.A., 2007. Multi-face detection based on downsampling and modified subtractive clustering for color images. J. Zhejiang Univ.-Sci. A, 8(1):72-78.

[6]Lee, W.J., Chung, J.S., Ouyang, C.S., Lee, S.J., 2007. Vector quantization of images using a fuzzy clustering method. Cybern. Syst., 39(1):45-60.

[7]Leng, J., Hong, T.P., 2010. Mining outliers in correlated subspaces for high dimensional data sets. Fundam. Inform., 98(1):71-86.

[8]Lin, H.J., Yan, F.W., Kao, Y.T., 2005. An efficient GA-based clustering technique. Tamkang J. Sci. Eng., 8(2):113-122.

[9]Lin, J.C., 1996. Multi-class clustering by analytical two-class formulas. Int. J. Pattern Recogn. Artif. Intell., 10(4):307-323.

[10]Lu, J.F., Tang, J.B., Tang, Z.M., Yang, J.Y., 2008. Hierarchical initialization approach for K-means clustering. Pattern Recogn. Lett., 29(6):787-795.

[11]Mahajan, M., Nimbhorkar, P., Varadarajan, K., 2009. The Planar K-means Problem is NP-Hard. 3rd Int. Workshop on Algorithms and Computation, p.274-285.

[12]Seligson, D.B., Horvath, S., Shi, T., Yu, H., Tze, S., Grunstein, M., Kurdistani, S.K., 2005. Global histone modification patterns predict risk of prostate cancer recurrence. Nature, 435(7046):1262-1266.

[13]Theodoridis, S., Koutroumbas, K., 2009. Chapter 13—Clustering Algorithms II: Hierarchical Algorithms. In: Pattern Recognition (4th Ed.). Academic Press, Elsevier, London, p.653-700.

[14]Wang, R.Z., Tsai, Y.D., 2007. An image-hiding method with high hiding capacity based on best-block matching and k-means clustering. Pattern Recogn., 40(2):398-409.

[15]Wittkop, T., Emig, D., Lange, S., Rahmann, S., Albrecht, M., Morris, J.H., Bocker, S., Stoye, J., Baumbach, J., 2010. Partitioning biological data with transitivity clustering. Nat. Methods, 7(6):419-420.

[16]Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2005. A statistical information-based clustering approach in distance space. J. Zhejiang Univ.-Sci., 6A(1):71-78.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE