Article

A DC-programming algorithm for kernel selection

Authors:
Andreas Argyriou

University College London, London, UK

University College London, London, UK
View Profile

,
Raphael Hauser

Oxford University Computing Laboratory, Oxford, UK

Oxford University Computing Laboratory, Oxford, UK
View Profile

,
Charles A. Micchelli

State University of New York, The University at Albany, Albany, NY

State University of New York, The University at Albany, Albany, NY
View Profile

,
Massimiliano Pontil

University College London, London, UK

University College London, London, UK
View Profile

ICML '06: Proceedings of the 23rd international conference on Machine learningJune 2006Pages 41–48https://doi.org/10.1145/1143844.1143850

Published:25 June 2006Publication History

ICML '06: Proceedings of the 23rd international conference on Machine learning

Pages 41–48

ABSTRACT

We address the problem of learning a kernel for a given supervised learning task. Our approach consists in searching within the convex hull of a prescribed set of basic kernels for one which minimizes a convex regularization functional. A unique feature of this approach compared to others in the literature is that the number of basic kernels can be infinite. We only require that they are continuously parameterized. For example, the basic kernels could be isotropic Gaussians with variance in a prescribed interval or even Gaussians parameterized by multiple continuous parameters. Our work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel. Although this optimization problem is not convex, it belongs to the larger class of DC (difference of convex functions) programs. Therefore, we apply recent results from DC optimization theory to create a new algorithm for learning the kernel. Our experimental results on benchmark data sets show that this algorithm outperforms a previously proposed method.

References

Argyriou, A., Micchelli, C. A., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. Proc. of the 18th Conference on Learning Theory, pages 338--352.]] Google ScholarDigital Library
Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc., 686, 337--404.]]Google ScholarCross Ref
Bach, F. R., Lanckriet, G. R. G., & Jordan, M. I. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. Proc. of the 21st International Conference on Machine Learning.]] Google ScholarDigital Library
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, Springer, 2002.]] Google ScholarDigital Library
Ellis, S., & Nayakkankuppam, M. (2003). Phylogenetic analysis via DC programming (preprint). Department of Mathematics and Statistics, UMBC, Baltimore, MD.]]Google Scholar
Hartman, P. (1959). On functions representable as a difference of convex functions. Pacific Journal of Mathematics, 9: 707--713.]]Google ScholarCross Ref
Horst, R., & Thoai, N. V. (1999). DC programming: overview. Journal of Optimization Theory and Applications, 103, 1--41.]]Google ScholarDigital Library
Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El-Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5, 27--72.]] Google ScholarDigital Library
Lin, Y., & Zhang, H. H. (2003). Component selection and smoothing in smoothing spline analysis of variance models -- cosso (Technical Report 2556). Institute of Statistics Mimeo Series, NCSU.]]Google Scholar
Micchelli, C. A., & Pontil, M. (2005). Learning the kernel function via regularization. J. of Machine Learning Research, 6: 1099--1125, 2005.]] Google ScholarDigital Library
Micchelli, C. A., & Pontil, M. (2005b). Feature space perspectives for learning the kernel. To appear in Machine Learning (see also: Technical Report RN/05/11, Dept. of Computer Science, UCL).]]Google Scholar
Micchelli, C. A., Pontil, M., Wu, Q., & Zhou, D-X. (2005c). Error bounds for learning the kernel. Research Note RN/05/09, Dept of Computer Science, UCL, 2005.]]Google Scholar
Neumann, J., Schnörr, C., Steidl, G. (2004). SVM-based feature selction by direct objective minimisation. C. E. Rasmussen et al (eds.), Proc. of 26th DAGM Symposium.]]Google Scholar
Ong, C. S., Smola, A. J. & Williamson. R. C. (2003). Hyperkernels. Advances in Neural Information Processing Systems, 15, S. Becker et. al (Eds.), MIT Press, Cambridge, MA.]]Google Scholar
Rasmussen, C. E. & Williams, C. K. I. (2005). Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge, MA.]] Google ScholarCross Ref
Schoenberg, I. J. (1938). Metric spaces and completely monotone functions. Annals of Mathematics, 39, 811--841.]]Google ScholarCross Ref
Schölkopf, B. & Smola, A. J. (2002). Learning with Kernels. The MIT Press, Cambridge, MA.]]Google Scholar
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge University Press.]] Google ScholarCross Ref
Sonnenburg, S., Raetsch, G. & Schaefer, C. (2006). A General and efficient multiple kernel learning algorithm In Advances in Neural Information Processing Systems, 19.]]Google Scholar
Steinig, J. (1986). A rule of signs for real exponential polynomials. Acta Math. Hungar. 47 (1), 187--190.]]Google ScholarCross Ref

Index Terms

A DC-programming algorithm for kernel selection

Recommendations

Online kernel selection: algorithms and evaluations
AAAI'12: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence

Kernel methods have been successfully applied to many machine learning problems. Nevertheless, since the performance of kernel methods depends heavily on the type of kernels being used, identifying good kernels among a set of given kernels is important ...
Read More
Eigenvalues ratio for kernel selection of kernel methods
AAAI'15: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

The selection of kernel function which determines the mapping between the input space and the feature space is of crucial importance to kernel methods. Existing kernel selection approaches commonly use some measures of generalization error, which are ...
Read More
A Kernel-Induced Space Selection Approach to Model Selection in KLDA

Model selection in kernel linear discriminant analysis (KLDA) refers to the selection of appropriate parameters of a kernel function and the regularizer. By following the principle of maximum information preservation , this paper formulates the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '06: Proceedings of the 23rd international conference on Machine learning
June 2006
1154 pages
ISBN:1595933832
DOI:10.1145/1143844
Program Chairs:
William Cohen,
Andrew Moore
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 52
  Total Citations
  View Citations
- 442
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A DC-programming algorithm for kernel selection

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online kernel selection: algorithms and evaluations

Eigenvalues ratio for kernel selection of kernel methods

A Kernel-Induced Space Selection Approach to Model Selection in KLDA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A DC-programming algorithm for kernel selection

ICML '06: Proceedings of the 23rd international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Online kernel selection: algorithms and evaluations

Eigenvalues ratio for kernel selection of kernel methods

A Kernel-Induced Space Selection Approach to Model Selection in KLDA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media