| An information theoretic analysis of maximum likelihood mixture estimation for exponential families |
| Full text |
Pdf
(267 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 69
archive
Proceedings of the twenty-first international conference on Machine learning
table of contents
Banff, Alberta, Canada
Page: 8
Year of Publication: 2004
ISBN:1-58113-828-5
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 12, Citation Count: 2
|
|
|
ABSTRACT
An important task in unsupervised learning is maximum likelihood mixture estimation (MLME) for exponential families. In this paper, we prove a mathematical equivalence between this MLME problem and the rate distortion problem for Bregman divergences. We also present new theoretical results in rate distortion theory for Bregman divergences. Further, an analysis of the problems as a trade-off between compression and preservation of information is presented that yields the information bottleneck method as an interesting special case.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2004). Clustering with Bregman divergences. (To appear) Proc. SIAM Intl. Conf. on Data Mining.
|
| |
3
|
Berger, T. (1971). Rate distortion theory: A mathematical basis for data compression. Prentice-Hall.
|
| |
4
|
Collins, M., Dasgupta, S., & Schapire, R. (2001). A generalization of principal component analysis to the exponential family. 15th NIPS.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Finamore, W. A., & Pearlman, W. A. (1980). Optimal encoding of discrete-time continuous-amplitude memoryless sources with finite output alphabet. IEEE Transactions on Information Theory, 26, 144--155.
|
| |
8
|
|
| |
9
|
Gilad-Bachrach, R., Navot, A., & Tishby, N. (2003). An information theoretic tradeoff between complexity and accuracy. 16th COLT.
|
| |
10
|
Kearns, M., Mansour, Y., & Ng, A. (1997). An information-theoretic analysis of hard and soft assignment methods for clustering. 13th UAI.
|
| |
11
|
|
| |
12
|
Nigam, K. (2001). Using unlabeled data to improve text classification (Technical Report CMU-CS-01-126). Carnegie Mellon University.
|
| |
13
|
Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26, 195--239.
|
| |
14
|
Rockafeller, R. T. (1970). Convex analysis. Princeton University Press.
|
| |
15
|
Rose, K. (1994). A mapping approach to rate-distortion computation and analysis. IEEE Transactions on Information Theory, 40, 1939--1952.
|
| |
16
|
Rose, K. (1998). Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of IEEE, 86, 2210--2239.
|
| |
17
|
Slonim, N., & Weiss, Y. (2002). Maximum likelihood and the information bottleneck. 16th NIPS.
|
| |
18
|
Tishby, N., Pereira, F. C., & Bialek, W. (1999). The information bottleneck method. Proc. 37th Annual Allerton Conf. on Communication, Control and Computing (pp. 368--377).
|
|