skip to main content
10.1145/1015330.1015368acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm

Published: 04 July 2004 Publication History

Abstract

In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the Tanner-Wong data augmentation algorithm for making inference on the model. The Tanner-Wong algorithm is closely related to Gibbs sampling, and it also bears a strong resemblance to the expectation-maximization (EM) algorithm. For an efficient implementation, we propose a simplified Bayesian hierarchical model and the corresponding Tanner-Wong algorithm. We express the relationship between the kernel on the input space and the kernel on the output space as a symmetric-definite generalized eigenproblem. Based on this eigenproblem, an efficient approach to choosing the base kernel matrices is presented. The effectiveness of our Bayesian model with the Tanner-Wong algorithm is demonstrated through some classification experiments showing promising results.

References

[1]
Amari, S. (1995). Information geometry of the EM and em algorithms for neural networks. Neural Networks, 8, 1379--1408.
[2]
Bousquet, O., & Herrmann, D. J. L. (2003). On the complexity of learning the kernel matrix. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.
[3]
Crammer, K., Keshet, J., & Singer, Y. (2003). Kernel design using boosting. Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press.
[4]
Cristianini, N., Kandola, J., Elisseeff, A., & Shawe-Taylor, J. (2002). On kernel target alignment. Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press.
[5]
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1--38.
[6]
Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore: The Johns Hopkins University Press. Third edition.
[7]
Gupta, A., & Nagar, D. (2000). Matrix variate distributions. Boca Raton, FL: Chapman & Hall/CRC.
[8]
Horn, R. A., & Johnson, C. R. (1985). Matrix analysis. Cambridge, UK: Cambridge University Press.
[9]
Kandola, J., Shawe-Taylor, J., & Cristianini, N. (2002). Optimizing kernel alignment over combinations of kernels (Technical Report 2002--121). NeuroCOLT.
[10]
Lanckriet, G. R. G., Cristianini, N., Ghaoui, L. E., Bartlett, P., & Jordan, M. I. (2002). Learning the kernel matrix with semi-definite programming. Proceedings of the 19th International Conference on Machine Learning (pp. 323--330). Sydney, Australia.
[11]
Schafer, J. L. (1997). Analysis of incomplete multivariate data. Chapman & Hall.
[12]
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82, 528--550.
[13]
Tsuda, K., Akaho, S., & Asai, K. (2003). The em algorithm for kernel matrix completion with auxiliary data. Journal of Machine Learning Research, 4, 67--81.
[14]
Zhang, Z. (2003). Learning metrics via discriminant kernels and multidimensional scaling: Toward expected Euclidean representation. Proceedings of the 20th International Conference on Machine Learning (pp. 872--879). Washington, D.C., USA.
[15]
Zhang, Z., Kwok, J. T., Yeung, D. Y., & Xiong, Y. (2003a). Bayesian transductive learning of the kernel matrix using Wishart processes (Technical Report HKUST-CS03-09). Department of Computer Science, Hong Kong University of Science and Technology. Available from ftp://ftp.cs.ust.hk/pub/techreport/03/tr03-09.ps.gz.
[16]
Zhang, Z., Yeung, D. Y., & Kwok, J. T. (2003b). Gaussian-Wishart processes: A statistical view of kernels and its application to kernel learning (Technical Report HKUST-CS03-15). Department of Computer Science, Hong Kong University of Science and Technology. Available from ftp://ftp.cs.ust.hk/pub/techreport/03/tr03-15.ps.

Cited By

View all
  1. Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '04: Proceedings of the twenty-first international conference on Machine learning
    July 2004
    934 pages
    ISBN:1581138385
    DOI:10.1145/1015330
    • Conference Chair:
    • Carla Brodley
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2012)Scaling the kernel function based on the separating boundary in input spaceInformation Sciences: an International Journal10.1016/j.ins.2011.08.028184:1(140-154)Online publication date: 1-Feb-2012
    • (2011)A survey of the state of the art in learning the kernelsKnowledge and Information Systems10.1007/s10115-011-0404-631:2(193-221)Online publication date: 19-May-2011
    • (2010)Graph sharpeningExpert Systems with Applications: An International Journal10.1016/j.eswa.2010.04.05037:12(7870-7879)Online publication date: 1-Dec-2010
    • (2008)Learning element similarity matrix for semi-structured document analysisKnowledge and Information Systems10.1007/s10115-008-0138-219:1(53-78)Online publication date: 8-May-2008
    • (2007)A Kernel Approach for Semisupervised Metric LearningIEEE Transactions on Neural Networks10.1109/TNN.2006.88372318:1(141-149)Online publication date: 1-Jan-2007
    • (2007)Kernel Resolution Synthesis for Superresolution2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '0710.1109/ICASSP.2007.365967(I-553-I-556)Online publication date: Apr-2007
    • (2007)Learning the kernel matrix by maximizing a KFD-based class separability criterionPattern Recognition10.1016/j.patcog.2006.12.03140:7(2021-2028)Online publication date: 1-Jul-2007
    • (2006)Efficient hyperkernel learning using second-order cone programmingIEEE Transactions on Neural Networks10.1109/TNN.2005.86084817:1(48-58)Online publication date: 1-Jan-2006
    • (2006)Model-based transductive learning of the kernel matrixMachine Language10.1007/s10994-006-6130-863:1(69-101)Online publication date: 1-Apr-2006
    • (2006)Graph based semi-supervised learning with sharper edgesProceedings of the 17th European conference on Machine Learning10.1007/11871842_39(401-412)Online publication date: 18-Sep-2006
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media