skip to main content
10.1145/1055558.1055581acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

k-means projective clustering

Published: 14 June 2004 Publication History

Abstract

In many applications it is desirable to cluster high dimensional data along various subspaces, which we refer to as projective clustering. We propose a new objective function for projective clustering, taking into account the inherent trade-off between the dimension of a subspace and the induced clustering error. We then present an extension of the k-means clustering algorithm for projective clustering in arbitrary subspaces, and also propose techniques to avoid local minima. Unlike previous algorithms, ours can choose the dimension of each cluster independently and automatically. Furthermore, experimental results show that our algorithm is significantly more accurate than the previous approaches.

References

[1]
P. Agarwal and S. Har-Peled, Maintaining the approximate extent measures of moving points, Proc. 12th ACM-SIAM Sympos. Discrete Algorithms, 2001, pp. 148--157.]]
[2]
P. K. Agarwal, S. Har-Peled, N. Mustafa, and Y. Wang, Nearlinear time approximation algorithms for curve simplification in two and three dimensions, Proc. of the 10th European Symposium on Algorithms, 2002, pp. 544--555.]]
[3]
C. Aggarwal and P. Yu, Finding generalized projected clusters in high dimensional spaces, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2000, pp. 70--81.]]
[4]
C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park, Fast algorithms for projected clustering, Proc. ACM SIGMOD Intl. Conf. on Management of Data, 1999, pp. 61--72.]]
[5]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 94--105.]]
[6]
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, When is "nearest neighbor" meaningful?, Lecture Notes in Computer Science, 1540 (1999), 217--235.]]
[7]
K. Chakrabarti and S. Mehrotra, Local dimensionality reduction: A new approach to indexing high dimensional spaces, Proc. of 26th Intl. Conf. on Very Large Data Bases, 2000, pp. 89--100.]]
[8]
Q. Du, V. Faber, and M. Gunzburger, Centroidal voronoi tessellations: Applications and algorithms, SIAM Review, 41 (1999), 637--676.]]
[9]
S. Guha, R. Rastogi, and K. Shim, CURE: an efficient clustering algorithm for large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1998, pp. 73--84.]]
[10]
J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.]]
[11]
S. Har-Peled and K. Varadarajan, Approximate shape fitting via linearization, Proc. 42nd Annu. IEEE Sympos. Found. Comput. Sci., 2001, pp. 66--73.]]
[12]
R. M. Heiberger, Algorithm AS 127: Generation of random orthogonal matrices, Appl. Statist., 27 (1978), 199--206.]]
[13]
A. Hinneburg, C. C. Aggarwal, and D. A. Keim, What is the nearest neighbor in high dimensional spaces?, Proc. 26th Intl. Conf. Very Large DataBases, 2000, pp. 506--515.]]
[14]
A. Hinneburg and D. A. Keim, Optimal grid-clustering:towards breaking the curse of dimensionality in high-dimensional clustering, Proc. of 25th Intl. Conf. Very Large DataBases, 1999, pp. 506--517.]]
[15]
A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988.]]
[16]
W. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp. Math., 26 (1984), 189--206.]]
[17]
T. T. Jolliffe, Principal component analysis, Springer-Verlag, New York, 2002.]]
[18]
R. T. Ng and J. Han, Efficient and effective clustering methods for spatial data mining, Intl. Conf. Very Large DataBases, 1994, pp. 144--155.]]
[19]
M. Procopiuc, M. Jones. P. K. Agarwal, and T. M. Murali, A monte carlo algorithm for fast projective clustering, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 2002, pp. 418--427.]]
[20]
T. Zhang, R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases, Proc. ACM-SIGMOD Intl. Conf. Management of Data, 1996, pp. 103--114.]]

Cited By

View all
  • (2024)Fusion over the Grassmannian for High-Rank Matrix Completion2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619457(1550-1555)Online publication date: 7-Jul-2024
  • (2024)Outlier Detect using Vector Cosine Similarity by Adding a Dimension2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)10.1109/ICAIIC60209.2024.10463442(255-259)Online publication date: 19-Feb-2024
  • (2024)TSCAPE: time series clustering with curve analysis and projection on an Euclidean spaceConnection Science10.1080/09540091.2024.231211936:1Online publication date: 14-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)4
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fusion over the Grassmannian for High-Rank Matrix Completion2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619457(1550-1555)Online publication date: 7-Jul-2024
  • (2024)Outlier Detect using Vector Cosine Similarity by Adding a Dimension2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)10.1109/ICAIIC60209.2024.10463442(255-259)Online publication date: 19-Feb-2024
  • (2024)TSCAPE: time series clustering with curve analysis and projection on an Euclidean spaceConnection Science10.1080/09540091.2024.231211936:1Online publication date: 14-Feb-2024
  • (2024)Determining the Number of Clusters in Clinical Response of TMS Treatment using Hyperdimensional ComputingJournal of Signal Processing Systems10.1007/s11265-024-01921-y96:8-9(509-523)Online publication date: 26-Jun-2024
  • (2023)Tight bounds for volumetric spanners and applicationsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666165(916-930)Online publication date: 10-Dec-2023
  • (2023)K-Subspaces for Sequential Data2023 IEEE 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)10.1109/CAMSAP58249.2023.10403417(496-500)Online publication date: 10-Dec-2023
  • (2022)Fully Unsupervised Person Re-Identification via Centroids and Neighborhoods Joint Learning2022 IEEE 31st International Symposium on Industrial Electronics (ISIE)10.1109/ISIE51582.2022.9831570(1127-1132)Online publication date: 1-Jun-2022
  • (2022)Data-driven Kernel Subspace Clustering with Local Manifold Preservation2022 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW58026.2022.00116(876-884)Online publication date: Nov-2022
  • (2022)Three-dimensional Data Outlier Detected by Angle Analysis2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC)10.1109/ICAIIC54071.2022.9722637(446-450)Online publication date: 21-Feb-2022
  • (2022)Multi-Scale Deep Subspace Clustering With Discriminative LearningIEEE Access10.1109/ACCESS.2022.320048210(91283-91293)Online publication date: 2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media