Abstract
Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatically from the recent history. To fulfill this dual purpose, a measure of temporal smoothness is integrated in the overall measure of clustering quality. In this article, we propose two frameworks that incorporate temporal smoothness in evolutionary spectral clustering. For both frameworks, we start with intuitions gained from the well-known k-means clustering problem, and then propose and solve corresponding cost functions for the evolutionary spectral clustering problems. Our solutions to the evolutionary spectral clustering problems provide more stable and consistent clustering results that are less sensitive to short-term noises while at the same time are adaptive to long-term cluster drifts. Furthermore, we demonstrate that our methods provide the optimal solutions to the relaxed versions of the corresponding evolutionary k-means clustering problems. Performance experiments over a number of real and synthetic data sets illustrate our evolutionary spectral clustering methods provide more robust clustering results that are not sensitive to noise and can adapt to data drifts.
- Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2003. A framework for clustering evolving data streams. In Proceedings of the 12th VLDB Conference. Google ScholarDigital Library
- Asur, S., Parthasarathy, S., and Ucar, D. 2007. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In Proceedings of the 13th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Bach, F. R. and Jordan, M. I. 2006. Learning spectral clustering, with application to speech separation. J. Mach. Learn. Res. 7. Google ScholarDigital Library
- Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Charikar, M., Chekuri, C., Feder, T., and Motwani, R. 1997. Incremental clustering and dynamic information retrieval. In Proceedings of the 29th STOC Conference. ACM, New York. Google ScholarDigital Library
- Chatfield, C. 2003. The Analysis of Time Series: An Introduction. Chapman&Hall/CRC.Google Scholar
- Chi, Y., Song, X., Zhou, D., Hino, K., and Tseng, B. L. 2007. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the 13th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Chung, F. R. K. 1997. Spectral Graph Theory. American Mathematical Society.Google Scholar
- De Lathauwer, L., De Moor, B., and Vandewalle, J. 2000. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 4. Google ScholarDigital Library
- Dhillon, I. S., Guan, Y., and Kulis, B. 2004. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the 10th ACM SIGKDD Conference. Google ScholarDigital Library
- Ding, C. and He, X. 2004. K-means clustering via principal component analysis. In Proceedings of the 21st ICML Conference. Google ScholarDigital Library
- Fan, K. 1949. On a theorem of weyl concerning eigenvalues of linear transformations. In Proceedings of the National Academy of Science.Google ScholarCross Ref
- Golub, G. and Loan, C. V. 1996. Matrix Computations, third ed. Johns Hopkins University Press. Google ScholarDigital Library
- Guha, S., Mishra, N., Motwani, R., and O'Callaghan, L. 2000. Clustering data streams. In Proceedings of the IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA. Google ScholarDigital Library
- Gupta, C. and Grossman, R. 2004. Genic: A single pass generalized incremental algorithm for clustering. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Philadelphia, PA.Google Scholar
- Hubert, L. J. and Arabie, P. 1985. Comparing partitions. J. Classif. 2.Google Scholar
- Ji, X. and Xu, W. 2006. Document clustering with prior knowledge. In Proceedings of the SIGIR. ACM, New York. Google ScholarDigital Library
- Li, Y., Han, J., and Yang, J. 2004. Clustering moving objects. In Proceedings of the 10th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Mei, Q. and Zhai, C. 2005. Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In Proceedings of the 11th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Newman, M. E. J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E.Google Scholar
- Ng, A., Jordan, M. and Weiss, Y. 2001. On spectral clustering: Analysis and an algorithm. In NIPS.Google Scholar
- Ning, H., Xu, W., Chi, Y., Gong, Y., and Huang, T. 2007. Incremental spectral clustering with application to monitoring of evolving blog communities. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Philadelphia, PA.Google Scholar
- Palla, G., Barabasi, A.-L., and Vicsek, T. 2007. Quantifying social group evolution. Nature 446.Google Scholar
- Sarkar, P. and Moore, A. W. 2005. Dynamic social network analysis using latent space models. SIGKDD Explor. Newsl. 7, 2. Google ScholarDigital Library
- Shi, J. and Malik, J. 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell. 22, 8. Google ScholarDigital Library
- Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. 2006. Monic: Modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Sun, J., Faloutsos, C., Papadimitriou, S., and Yu, P. S. 2007. GraphScope: Parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD Conference. ACM, New York. Google ScholarDigital Library
- Toyoda, M. and Kitsuregawa, M. 2003. Extracting evolution of web communities from a series of web archives. In HYPERTEXT '03: Proceedings of the 14th ACM Conference on Hypertext and Hypermedia. ACM, New York. Google ScholarDigital Library
- Wagstaff, K., Cardie, C., Rogers, S., and Schroedl, S. 2001. Constrained K-means clustering with background knowledge. In Proceedings of the 18th ICML Conference. Google ScholarDigital Library
- Weiss, Y. 1999. Segmentation using eigenvectors: A unifying view. In ICCV '99: Proceedings of the International Conference on Computer Vision Volume 2. Google ScholarDigital Library
- Xu, W., Liu, X., and Gong, Y. 2002. Spectral text clustering. Tech. Rep. 2002-L011, NEC Laboratories America.Google Scholar
- Xu, W., Liu, X., and Gong, Y. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th SIGIR Conference. ACM, New York. Google ScholarDigital Library
- Zha, H., He, X., Ding, C. H. Q., Gu, M., and Simon, H. D. 2001. Spectral relaxation for k-means clustering. In NIPS.Google Scholar
- Zhou, D., Huang, J., and Schölkopf, B. 2005. Learning from labeled and unlabeled data on a directed graph. In Proceedings of the 22nd ICML Conference. Google ScholarDigital Library
Index Terms
- On evolutionary spectral clustering
Recommendations
Evolutionary clustering
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningWe consider the problem of clustering data over time. An evolutionary clustering should simultaneously optimize two potentially conflicting criteria: first, the clustering at any point in time should remain faithful to the current data as much as ...
Evolutionary spectral clustering by incorporating temporal smoothness
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningEvolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, ...
Efficient evolutionary spectral clustering
An evolutionary spectral clustering method using a smoothed Laplacian is presented.The incomplete Cholesky decomposition (ICD) reduces runtime from O(N3) to O(N).The memory requirements are decreased from O(N2) to O(N).A stopping criterion using the ...
Comments