Article

Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm

Authors:

James T. KwokAuthors Info & Claims

ICML '04: Proceedings of the twenty-first international conference on Machine learning

Page 118

https://doi.org/10.1145/1015330.1015368

Published: 04 July 2004 Publication History

Abstract

In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the Tanner-Wong data augmentation algorithm for making inference on the model. The Tanner-Wong algorithm is closely related to Gibbs sampling, and it also bears a strong resemblance to the expectation-maximization (EM) algorithm. For an efficient implementation, we propose a simplified Bayesian hierarchical model and the corresponding Tanner-Wong algorithm. We express the relationship between the kernel on the input space and the kernel on the output space as a symmetric-definite generalized eigenproblem. Based on this eigenproblem, an efficient approach to choosing the base kernel matrices is presented. The effectiveness of our Bayesian model with the Tanner-Wong algorithm is demonstrated through some classification experiments showing promising results.

References

[1]

Amari, S. (1995). Information geometry of the EM and em algorithms for neural networks. Neural Networks, 8, 1379--1408.

Digital Library

[2]

Bousquet, O., & Herrmann, D. J. L. (2003). On the complexity of learning the kernel matrix. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.

[3]

Crammer, K., Keshet, J., & Singer, Y. (2003). Kernel design using boosting. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.

[4]

Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2002). On kernel target alignment. Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press.

[5]

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1--38.

[6]

Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore: The Johns Hopkins University Press. Third edition.

[7]

Gupta, A., & Nagar, D. (2000). Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC.

[8]

Horn, R. A., & Johnson, C. R. (1985). Matrix analysis. Cambridge, UK: Cambridge University Press.

Digital Library

[9]

Kandola, J., Shawe-Taylor, J., & Cristianini, N. (2002). Optimizing kernel alignment over combinations of kernels (Technical Report 2002--121). NeuroCOLT.

[10]

Lanckriet, G. R. G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M. I. (2002). Learning the kernel matrix with semi-definite programming. Proceedings of the 19th International Conference on Machine Learning (pp. 323--330). Sydney, Australia.

Digital Library

[11]

Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman & Hall.

[12]

Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528--550.

[13]

Tsuda, K., Akaho, S., & Asai, K. (2003). The em algorithm for kernel matrix completion with auxiliary data. Journal of Machine Learning Research, 4, 67--81.

Digital Library

[14]

Zhang, Z. (2003). Learning metrics via discriminant kernels and multidimensional scaling: Toward expected Euclidean representation. Proceedings of the 20th International Conference on Machine Learning (pp. 872--879). Washington, D.C., USA.

[15]

Zhang, Z., Kwok, J. T., Yeung, D. Y., & Xiong, Y. (2003a). Bayesian transductive learning of the kernel matrix using Wishart processes (Technical Report HKUST-CS03-09). Department of Computer Science, Hong Kong University of Science and Technology. Available from ftp://ftp.cs.ust.hk/pub/techreport/03/tr03-09.ps.gz.

[16]

Zhang, Z., Yeung, D. Y., & Kwok, J. T. (2003b). Gaussian-Wishart processes: A statistical view of kernels and its application to kernel learning (Technical Report HKUST-CS03-15). Department of Computer Science, Hong Kong University of Science and Technology. Available from ftp://ftp.cs.ust.hk/pub/techreport/03/tr03-15.ps.

Cited By

Sun JLi XYang YLuo JBai Y(2012)Scaling the kernel function based on the separating boundary in input spaceInformation Sciences: an International Journal10.1016/j.ins.2011.08.028184:1(140-154)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1016/j.ins.2011.08.028
Abbasnejad MRamachandram DMandava R(2011)A survey of the state of the art in learning the kernelsKnowledge and Information Systems10.1007/s10115-011-0404-631:2(193-221)Online publication date: 19-May-2011
https://doi.org/10.1007/s10115-011-0404-6
Shin HHill NLisewski APark J(2010)Graph sharpeningExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.04.05037:12(7870-7879)Online publication date: 1-Dec-2010
https://dl.acm.org/doi/10.1016/j.eswa.2010.04.050
Show More Cited By

Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm
1. Computing methodologies

Recommendations

Model-based transductive learning of the kernel matrix

This paper addresses the problem of transductive learning of the kernel matrix from a probabilistic perspective. We define the kernel matrix as a Wishart process prior and construct a hierarchical generative model for kernel matrix learning. ...
Model-based kernel sum rule: kernel Bayesian inference with probabilistic models
Abstract
Kernel Bayesian inference is a principled approach to nonparametric inference in probabilistic graphical models, where probabilistic relationships between variables are learned from data in a nonparametric manner. Various algorithms of kernel ...
Kernel Bayes' rule: Bayesian inference with positive definite kernels

A kernel method for realizing Bayes' rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces. Probabilities are uniquely characterized by the mean of the canonical map to the RKHS. The prior and conditional ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '04: Proceedings of the twenty-first international conference on Machine learning

July 2004

934 pages

ISBN:1581138385

DOI:10.1145/1015330

Conference Chair:
Carla Brodley
Purdue University/Tufts University

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
367
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun JLi XYang YLuo JBai Y(2012)Scaling the kernel function based on the separating boundary in input spaceInformation Sciences: an International Journal10.1016/j.ins.2011.08.028184:1(140-154)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1016/j.ins.2011.08.028
Abbasnejad MRamachandram DMandava R(2011)A survey of the state of the art in learning the kernelsKnowledge and Information Systems10.1007/s10115-011-0404-631:2(193-221)Online publication date: 19-May-2011
https://doi.org/10.1007/s10115-011-0404-6
Shin HHill NLisewski APark J(2010)Graph sharpeningExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.04.05037:12(7870-7879)Online publication date: 1-Dec-2010
https://dl.acm.org/doi/10.1016/j.eswa.2010.04.050
Yang JCheung WChen X(2008)Learning element similarity matrix for semi-structured document analysisKnowledge and Information Systems10.1007/s10115-008-0138-219:1(53-78)Online publication date: 8-May-2008
https://doi.org/10.1007/s10115-008-0138-2
Yeung DChang H(2007)A Kernel Approach for Semisupervised Metric LearningIEEE Transactions on Neural Networks10.1109/TNN.2006.88372318:1(141-149)Online publication date: 1-Jan-2007
https://dl.acm.org/doi/10.1109/TNN.2006.883723
Ni KNguyen T(2007)Kernel Resolution Synthesis for Superresolution2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '0710.1109/ICASSP.2007.365967(I-553-I-556)Online publication date: Apr-2007
https://doi.org/10.1109/ICASSP.2007.365967
Yeung DChang HDai G(2007)Learning the kernel matrix by maximizing a KFD-based class separability criterionPattern Recognition10.1016/j.patcog.2006.12.03140:7(2021-2028)Online publication date: 1-Jul-2007
https://dl.acm.org/doi/10.1016/j.patcog.2006.12.031
Wai-Hung Tsang ITin-Yau Kwok J(2006)Efficient hyperkernel learning using second-order cone programmingIEEE Transactions on Neural Networks10.1109/TNN.2005.86084817:1(48-58)Online publication date: 1-Jan-2006
https://dl.acm.org/doi/10.1109/TNN.2005.860848
Zhang ZKwok JYeung D(2006)Model-based transductive learning of the kernel matrixMachine Language10.1007/s10994-006-6130-863:1(69-101)Online publication date: 1-Apr-2006
https://dl.acm.org/doi/10.1007/s10994-006-6130-8
Shin HHill NRätsch G(2006)Graph based semi-supervised learning with sharper edgesProceedings of the 17th European conference on Machine Learning10.1007/11871842_39(401-412)Online publication date: 18-Sep-2006
https://dl.acm.org/doi/10.1007/11871842_39
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents