ABSTRACT
Clustering in data analysis means data with similar features are grouped together within a particular valid cluster. Each cluster consists of data that are more similar among themselves and dissimilar to data of other clusters. Clustering can be viewed as an unsupervised learning concept from machine learning perspective. In this paper, we have proposed an effective method to obtain better clustering with much reduced complexity. We have evaluated the performances of the classical K-Means approach of data clustering and the proposed Far Efficient K-Means method. The accuracy of both these algorithms were examined taking several data sets taken from UCI [13] repository of machine learning databases. Their clustering efficiency has been compared in conjunction with two typical cluster validity indices, namely the Davies-Bouldin Index and the Dunn's Index for different number of clusters, and our experimental results demonstrated that the quality of clustering by proposed method is much efficient than K-Means algorithm when larger data sets with more number of attributes are taken into consideration.
- Z. Li, J. Yuan, H. Yang and Ke Zhang, "K-Mean Algorithm with a Distance Based on the Characteristic of Differences", "IEEE International conference on Wireless communications, Networking and mobile computing", pp. 1--4, Oct. 2008.Google Scholar
- S. Saha S. Bandyopadhyay and C. Singh, "A New Line Symmetry Distance Based Pattern Classifier", "International joint conference on Neural networks as part of 2008 IEEE WCCI", pp. 1426--1433, 2008.Google Scholar
- Shi Na, L. Xumin, G. Yong, "Research on K-Means clustering algorithm-An Improved K-Means Clustering Algorithm", "IEEE Third International Symposium on Intelligent Information Technology and Security Informatics", pp. 63--67, Apr. 2010. Google ScholarDigital Library
- D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure", "IEEE Trans. Pattern Analysis and Machine Intelligence", vol. 1, pp. 224--227, 1979. Google ScholarDigital Library
- J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", J. Cybernetics, vol. 3, pp. 32--57, 1973.Google ScholarCross Ref
- T. Kanungo, D. Mount, N. Netanyahu, C. Piatko and A. Wu, "An Efficient K-Means Clustering Algorithm: Analysis and Implementation", "IEEE Transactions on Pattern analysis and Machine intelligence", vol. 24, no. 7, 2002 Google ScholarDigital Library
- R. Xu and D. Wunsch, "Survey of Clustering Algorithms", "IEEE Transactions on Neural networks", vol. 16, no. 3, May 2005. Google ScholarDigital Library
- Y. M. Cheung, "A New Generalized K-Means Clustering Algorithm", "Pattern Recognition Letters, Elsevier", vol. 24, issue 15, 2883--2893, Nov. 2003. Google ScholarDigital Library
- C. S. Li, "Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters", "2011 International Conference on Advances in Engineering, Elsevier", pp. 324--328, vol.24, 2011.Google Scholar
- M. Erisoglu, N. Calis and S. Sakallioglu, "A new algorithm for initial cluster centers in K-Means algorithm", "Published in Pattern Recognition Letters", vol. 32, issue 14, Oct. 2011. Google ScholarDigital Library
- D. Napoleon and P. G. Laxmi, "An Efficient K-Means Clustering Algorithm for Reducing Time Complexity using Uniform Distribution Data Points", "IEEE Trendz in Information science and computing", pp. 42--45, Feb. 2011.Google Scholar
- J. Mac Queen, "Some methods for classification and analysis of multivariate observations", "Fifth Berkeley Symposium on Mathematics, Statistics and Probability", pp. 281--297, University of California Press, 1967.Google Scholar
- C. Merz and P. Murphy, UCI Repository of Machine Learning Databases, Available: fttp://ftp.ics.uci.edu/pub/machine-learning-databases.Google Scholar
Index Terms
Far efficient K-means clustering algorithm
Recommendations
A dissimilarity measure based Fuzzy c-means FCM clustering algorithm
According to the definition of cluster objects belonging to same cluster must have high similarity while objects belonging to different clusters should be highly dissimilar. In the same way cluster validity indices for analyzing clustering result are ...
An efficient hybrid clustering algorithm for molecular sequences classification
ACM-SE 44: Proceedings of the 44th annual Southeast regional conferenceThe k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome ...
RK-Means Clustering: K-Means with Reliability
This paper presents an RK-means clustering algorithm which is developed for reliable data grouping by introducing a new reliability evaluation to the K-means clustering algorithm. The conventional K-means clustering algorithm has two shortfalls: 1) the ...
Comments