research-article

Multi-view clustering via canonical correlation analysis

Authors:
Kamalika Chaudhuri

ITA, UC San Diego, La Jolla, CA

ITA, UC San Diego, La Jolla, CA
View Profile

,
Sham M. Kakade

Toyota Technological Institute at Chicago, Chicago, IL

Toyota Technological Institute at Chicago, Chicago, IL
View Profile

,
Karen Livescu

Toyota Technological Institute at Chicago, Chicago, IL

Toyota Technological Institute at Chicago, Chicago, IL
View Profile

,
Karthik Sridharan

Toyota Technological Institute at Chicago, Chicago, IL

Toyota Technological Institute at Chicago, Chicago, IL
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 129–136https://doi.org/10.1145/1553374.1553391

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 129–136

ABSTRACT

Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA).

Under the assumption that the views are un-correlated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).

References

Achlioptas, D., & McSherry, F. (2005). On spectral learning of mixtures of distributions. Conf. on Learning Thy (pp. 458--469). Google ScholarDigital Library
Ando, R. K., & Zhang, T. (2007). Two-view feature generation model for semi-supervised learning. Int. Conf. on Machine Learning (pp. 25--32). Google ScholarDigital Library
Arora, S., & Kannan, R. (2005). Learning mixtures of separated nonspherical Gaussians. Ann. Applied Prob., 15, 69--92.Google ScholarCross Ref
Blaschko, M. B., & Lampert, C. H. (2008). Correlational spectral clustering. Conf. on Comp. Vision and Pattern Recognition.Google ScholarCross Ref
Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. Conf. on Learning Thy. (pp. 92--100). Google ScholarDigital Library
Brubaker, S. C., & Vempala, S. (2008). Isotropic PCA and affine-invariant clustering. Found. of Comp. Sci. (pp. 551--560). Google ScholarDigital Library
Chaudhuri, K., & Rao, S. (2008). Learning mixtures of distributions using correlations and independence. Conf. On Learning Thy. (pp. 9--20).Google Scholar
Dasgupta, S. (1999). Learning mixtures of Gaussians. Found. of Comp. Sci. (pp. 634--644). Google ScholarDigital Library
Dasgupta, S., & Schulman, L. (2000). A two-round variant of EM for Gaussian mixtures. Uncertainty in Art. Int. (pp. 152--159). Google ScholarDigital Library
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Proc., 28, 357--366.Google ScholarCross Ref
Dunn, G., & Everitt, B. (2004). An introduction to math. taxonomy. Dover Books.Google Scholar
Kakade, S. M., & Foster, D. P. (2007). Multi-view regression via canonical correlation analysis. Conf. Learning Thy (pp. 82--96). Google ScholarDigital Library
Kannan, R., Salmasian, H., & Vempala, S. (2005). The spectral method for general mixture models. Conf. on Learning Thy (pp. 444--457). Google ScholarDigital Library
Rudelson, M., & Vershynin, R. (2007). Sampling from large matrices: An approach through geometric functional analysis. Jour. of ACM. Google ScholarDigital Library
Sanderson, C. (2008). Biometric person recognition: Face, speech and fusion. VDM-Verlag.Google Scholar
Vempala, V., & Wang, G. (2002). A spectral algorithm for learning mixtures of distributions. Found. of Comp. Sci. (pp. 113--123). Google ScholarDigital Library

Index Terms

Multi-view clustering via canonical correlation analysis
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
      2. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
      1. Classification and regression trees
      2. Factorization methods
        Canonical correlation analysis
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
      2. Modeling methodologies
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Regression analysis

Recommendations

Multi-view dimensionality reduction via canonical random correlation analysis

Canonical correlation analysis (CCA) is one of the most well-known methods to extract features from multi-view data and has attracted much attention in recent years. However, classical CCA is unsupervised and does not take discriminant information into ...
Read More
A New Locality-Preserving Canonical Correlation Analysis Algorithm for Multi-View Dimensionality Reduction

Canonical correlation analysis (CCA) is a well-known technique for extracting linearly correlated features from multiple views (i.e., sets of features) of data. Recently, a locality-preserving CCA, named LPCCA, has been developed to incorporate the ...
Read More
Multi-view Fractional Deep Canonical Correlation Analysis for Subspace Clustering
Neural Information Processing
Abstract
Canonical correlation analysis (CCA) is a classic unsupervised dimensionality reduction approach, but it has difficulty to analyze the nonlinear relationship and learn from more than two views. In addition, real-world data sets often contain much ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 495
  Total Citations
  View Citations
- 2,155
  Total Downloads
- Downloads (Last 12 months)121
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-view clustering via canonical correlation analysis

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-view dimensionality reduction via canonical random correlation analysis

A New Locality-Preserving Canonical Correlation Analysis Algorithm for Multi-View Dimensionality Reduction

Multi-view Fractional Deep Canonical Correlation Analysis for Subspace Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-view clustering via canonical correlation analysis

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-view dimensionality reduction via canonical random correlation analysis

A New Locality-Preserving Canonical Correlation Analysis Algorithm for Multi-View Dimensionality Reduction

Multi-view Fractional Deep Canonical Correlation Analysis for Subspace Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media