ACM Home Page
Please provide us with feedback. Feedback
An information theoretic analysis of maximum likelihood mixture estimation for exponential families
Full text PdfPdf (267 KB)
Source ACM International Conference Proceeding Series; Vol. 69 archive
Proceedings of the twenty-first international conference on Machine learning table of contents
Banff, Alberta, Canada
Page: 8  
Year of Publication: 2004
ISBN:1-58113-828-5
Authors
Arindam Banerjee  Univ of Texas at Austin, Austin, TX
Inderjit Dhillon  Univ of Texas at Austin, Austin, TX
Joydeep Ghosh  Univ of Texas at Austin, Austin, TX
Srujana Merugu  Univ of Texas at Austin, Austin, TX
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 12,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1015330.1015431
What is a DOI?

ABSTRACT

An important task in unsupervised learning is maximum likelihood mixture estimation (MLME) for exponential families. In this paper, we prove a mathematical equivalence between this MLME problem and the rate distortion problem for Bregman divergences. We also present new theoretical results in rate distortion theory for Bregman divergences. Further, an analysis of the problems as a trade-off between compression and preservation of information is presented that yields the information bottleneck method as an interesting special case.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2004). Clustering with Bregman divergences. (To appear) Proc. SIAM Intl. Conf. on Data Mining.
 
3
Berger, T. (1971). Rate distortion theory: A mathematical basis for data compression. Prentice-Hall.
 
4
Collins, M., Dasgupta, S., & Schapire, R. (2001). A generalization of principal component analysis to the exponential family. 15th NIPS.
 
5
 
6
 
7
Finamore, W. A., & Pearlman, W. A. (1980). Optimal encoding of discrete-time continuous-amplitude memoryless sources with finite output alphabet. IEEE Transactions on Information Theory, 26, 144--155.
 
8
 
9
Gilad-Bachrach, R., Navot, A., & Tishby, N. (2003). An information theoretic tradeoff between complexity and accuracy. 16th COLT.
 
10
Kearns, M., Mansour, Y., & Ng, A. (1997). An information-theoretic analysis of hard and soft assignment methods for clustering. 13th UAI.
 
11
 
12
Nigam, K. (2001). Using unlabeled data to improve text classification (Technical Report CMU-CS-01-126). Carnegie Mellon University.
 
13
Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195--239.
 
14
Rockafeller, R. T. (1970). Convex analysis. Princeton University Press.
 
15
Rose, K. (1994). A mapping approach to rate-distortion computation and analysis. IEEE Transactions on Information Theory, 40, 1939--1952.
 
16
Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of IEEE, 86, 2210--2239.
 
17
Slonim, N., & Weiss, Y. (2002). Maximum likelihood and the information bottleneck. 16th NIPS.
 
18
Tishby, N., Pereira, F. C., & Bialek, W. (1999). The information bottleneck method. Proc. 37th Annual Allerton Conf. on Communication, Control and Computing (pp. 368--377).

Collaborative Colleagues:
Arindam Banerjee: colleagues
Inderjit Dhillon: colleagues
Joydeep Ghosh: colleagues
Srujana Merugu: colleagues