Full Text:   <5912>

CLC number: TP301.6

On-line Access: 

Received: 2006-03-15

Revision Accepted: 2006-05-11

Crosschecked: 0000-00-00

Cited: 11

Clicked: 4135

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2006 Vol.7 No.10 P.1626~1633

http://doi.org/10.1631/jzus.2006.A1626


An efficient enhanced k-means clustering algorithm


Author(s):  FAHIM A.M., SALEM A.M., TORKEY F.A., RAMADAN M.A.

Affiliation(s):  Department of Mathematics, Faculty of Education, Suez Canal University, Suez city, Egypt; more

Corresponding email(s):   ahmmedfahim@yahoo.com

Key Words:  Clustering algorithms, Cluster analysis, k-means algorithm, Data analysis


FAHIM A.M., SALEM A.M., TORKEY F.A., RAMADAN M.A.. An efficient enhanced k-means clustering algorithm[J]. Journal of Zhejiang University Science A, 2006, 7(10): 1626~1633.

@article{title="An efficient enhanced k-means clustering algorithm",
author="FAHIM A.M., SALEM A.M., TORKEY F.A., RAMADAN M.A.",
journal="Journal of Zhejiang University Science A",
volume="7",
number="10",
pages="1626~1633",
year="2006",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2006.A1626"
}

%0 Journal Article
%T An efficient enhanced k-means clustering algorithm
%A FAHIM A.M.
%A SALEM A.M.
%A TORKEY F.A.
%A RAMADAN M.A.
%J Journal of Zhejiang University SCIENCE A
%V 7
%N 10
%P 1626~1633
%@ 1673-565X
%D 2006
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2006.A1626

TY - JOUR
T1 - An efficient enhanced k-means clustering algorithm
A1 - FAHIM A.M.
A1 - SALEM A.M.
A1 - TORKEY F.A.
A1 - RAMADAN M.A.
J0 - Journal of Zhejiang University Science A
VL - 7
IS - 10
SP - 1626
EP - 1633
%@ 1673-565X
Y1 - 2006
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2006.A1626


Abstract: 
In k-means clustering, we are given a set of n data points in d-dimensional space Rd and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P., 1998. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Proc. ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p.94-105.

[2] Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. Proc. ACM SIGMOD Int. Con. Management of Data Mining, p.49-60.

[3] Duda, R.O., Hart, P.E., 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.

[4] Ester, M., Kriegel, H.P., Sander, J., Xu, X., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press, Portland, OR, p.226-231.

[5] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., 1996. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press.

[6] Gersho, A., Gray, R.M., 1992. Vector Quantization and Signal Compression. Kluwer Academic, Boston.

[7] Guha, S., Rastogi, R., Shim, K., 1998. CURE: An Efficient Clustering Algorithms for Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, p.73-84.

[8] Hinneburg, A., Keim, D., 1998. An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining. New York City, NY.

[9] Huang, Z., 1997. A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. Tech. Report 97-07, Dept. of CS, UBC.

[10] Jain, A.K., Dubes, R.C., 1988. Algorithms for Clustering Data. Prentice-Hall Inc.

[11] Kaufman, L., Rousseeuw, P.J., 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.

[12] MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations. 5th Berkeley Symp. Math. Statist. Prob., 1:281-297.

[13] Merz, P., 2003. An Iterated Local Search Approach for Minimum Sum of Squares Clustering. IDA 2003, p.286-296.

[14] Ng, R.T., Han, J., 1994. Efficient and Effective Clustering Methods for Spatial Data Mining. Proc. 20th Int. Conf. on Very Large Data Bases. Morgan Kaufmann Publishers, San Francisco, CA, p.144-155.

[15] Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. Proc. 24th Int. Conf. on Very Large Data Bases. New York, p.428-439.

[16] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Comp. Journal, 16(1):30-34.

[17] Zhang, T., Ramakrishnan, R., Linvy, M., 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proc. ACM SIGMOD Int. Conf. on Management of Data. ACM Press, New York, p.103-114.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE