skip to main content
10.1145/1143844.1143850acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

A DC-programming algorithm for kernel selection

Published:25 June 2006Publication History

ABSTRACT

We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the literature is that the number of basic kernels can be infinite. We only require that they are continuously parameterized. For example, the basic kernels could be isotropic Gaussians with variance in a prescribed interval or even Gaussians parameterized by multiple continuous parameters. Our work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel. Although this optimization problem is not convex, it belongs to the larger class of DC (difference of convex functions) programs. Therefore, we apply recent results from DC optimization theory to create a new algorithm for learning the kernel. Our experimental results on benchmark data sets show that this algorithm outperforms a previously proposed method.

References

  1. Argyriou, A., Micchelli, C. A., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. Proc. of the 18th Conference on Learning Theory, pages 338--352.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc., 686, 337--404.]]Google ScholarGoogle ScholarCross RefCross Ref
  3. Bach, F. R., Lanckriet, G. R. G., & Jordan, M. I. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. Proc. of the 21st International Conference on Machine Learning.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, Springer, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ellis, S., & Nayakkankuppam, M. (2003). Phylogenetic analysis via DC programming (preprint). Department of Mathematics and Statistics, UMBC, Baltimore, MD.]]Google ScholarGoogle Scholar
  6. Hartman, P. (1959). On functions representable as a difference of convex functions. Pacific Journal of Mathematics, 9: 707--713.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. Horst, R., & Thoai, N. V. (1999). DC programming: overview. Journal of Optimization Theory and Applications, 103, 1--41.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El-Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5, 27--72.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lin, Y., & Zhang, H. H. (2003). Component selection and smoothing in smoothing spline analysis of variance models -- cosso (Technical Report 2556). Institute of Statistics Mimeo Series, NCSU.]]Google ScholarGoogle Scholar
  10. Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. J. of Machine Learning Research, 6: 1099--1125, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Micchelli, C. A., & Pontil, M. (2005b). Feature space perspectives for learning the kernel. To appear in Machine Learning (see also: Technical Report RN/05/11, Dept. of Computer Science, UCL).]]Google ScholarGoogle Scholar
  12. Micchelli, C. A., Pontil, M., Wu, Q., & Zhou, D-X. (2005c). Error bounds for learning the kernel. Research Note RN/05/09, Dept of Computer Science, UCL, 2005.]]Google ScholarGoogle Scholar
  13. Neumann, J., Schnörr, C., Steidl, G. (2004). SVM-based feature selction by direct objective minimisation. C. E. Rasmussen et al (eds.), Proc. of 26th DAGM Symposium.]]Google ScholarGoogle Scholar
  14. Ong, C. S., Smola, A. J. & Williamson. R. C. (2003). Hyperkernels. Advances in Neural Information Processing Systems, 15, S. Becker et. al (Eds.), MIT Press, Cambridge, MA.]]Google ScholarGoogle Scholar
  15. Rasmussen, C. E. & Williams, C. K. I. (2005). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge, MA.]] Google ScholarGoogle ScholarCross RefCross Ref
  16. Schoenberg, I. J. (1938). Metric spaces and completely monotone functions. Annals of Mathematics, 39, 811--841.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. Schölkopf, B. & Smola, A. J. (2002). Learning with Kernels. The MIT Press, Cambridge, MA.]]Google ScholarGoogle Scholar
  18. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.]] Google ScholarGoogle ScholarCross RefCross Ref
  19. Sonnenburg, S., Raetsch, G. & Schaefer, C. (2006). A General and efficient multiple kernel learning algorithm In Advances in Neural Information Processing Systems, 19.]]Google ScholarGoogle Scholar
  20. Steinig, J. (1986). A rule of signs for real exponential polynomials. Acta Math. Hungar. 47 (1), 187--190.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A DC-programming algorithm for kernel selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '06: Proceedings of the 23rd international conference on Machine learning
          June 2006
          1154 pages
          ISBN:1595933832
          DOI:10.1145/1143844

          Copyright © 2006 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2006

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader