|
ABSTRACT
Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503--511.
|
| |
2
|
|
| |
3
|
|
| |
4
|
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 183--187.
|
| |
5
|
Fan, K. (1949). On a theorem of Weyl concerning eigenvalues of linear transformations. Proc. Natl. Acad. Sci. USA, 35, 652--655.
|
| |
6
|
|
| |
7
|
Goldstein, H. (1980). Classical mechanics. Addison-Wesley. 2nd edition.
|
| |
8
|
|
| |
9
|
Gordon, A., & Henderson, J. (1977). An algorithm for euclidean sum of squares classification. Biometrics, 355--362.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Springer Verlag.
|
| |
13
|
|
| |
14
|
Jolliffe, I. (2002). Principal component analysis. Springer. 2nd edition.
|
| |
15
|
Lloyd, S. (1957). Least squares quantization in pcm. Bell Telephone Laboratories Paper, Marray Hill.
|
| |
16
|
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symposium, 281--297.
|
| |
17
|
|
| |
18
|
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001).
|
| |
19
|
|
| |
20
|
Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2001). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS'01), 1057--1064.
|
CITED BY 10
|
Le Song , Alex Smola , Arthur Gretton , Karsten M. Borgwardt, A dependence maximization view of clustering, Proceedings of the 24th international conference on Machine learning, p.815-822, June 20-24, 2007, Corvalis, Oregon
|
|
Chris Ding , Ding Zhou , Xiaofeng He , Hongyuan Zha, R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization, Proceedings of the 23rd international conference on Machine learning, p.281-288, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
Hao Cheng , Kien A Hua , Khanh Vu , Danzhou Liu, Semi-supervised dimensionality reduction in image feature space, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
Chris Ding , Tao Li , Wei Peng , Haesun Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
Yun Chi , Xiaodan Song , Dengyong Zhou , Koji Hino , Belle L. Tseng, Evolutionary spectral clustering by incorporating temporal smoothness, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|