|
ABSTRACT
We devise a boosting approach to classification and regression based on column generation using a mixture of kernels. Traditional kernel methods construct models based on a single positive semi-definite kernel with the type of kernel predefined and kernel parameters chosen according to cross-validation performance. Our approach creates models that are mixtures of a library of kernel models, and our algorithm automatically determines kernels to be used in the final model. The 1-norm and 2-norm regularization methods are employed to restrict the ensemble of kernel models. The proposed method produces sparser solutions, and thus significantly reduces the testing time. By extending the column generation (CG) optimization which existed for linear programs with 1-norm regularization to quadratic programs with 2-norm regularization, we are able to solve many learning formulations by leveraging various algorithms for constructing single kernel models. By giving different priorities to columns to be generated, we are able to scale CG boosting to large datasets. Experimental results on benchmark data are included to demonstrate its effectiveness.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
J. Bi. Multi-objective programming in SVMs. In T. Fawcett and N. Mishra, editors, Proceedings of the Twentieth International Conference on Machine Learning, pages 35--42, Menlo Park, CA, 2003. AAAI Press.
|
| |
3
|
|
| |
4
|
O. Bousquet and D. J. L. Herrmann. On the complexity of learning the kernel matrix. In S. T. S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 399--406. MIT Press, Cambridge, MA, 2003.
|
| |
5
|
K. Crammer, J. Keshet, and Y. Singer. Kernel design using boosting. In S. T. S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 537--544. MIT Press, Cambridge, MA, 2003.
|
| |
6
|
N. Cristianini, A. Elisseef, and J. Shawe-Taylor. On optimizing kernel alignment. Technical Report NC2-TR-2001-087, NeuroCOLT, 2001.
|
| |
7
|
|
| |
8
|
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: data mining, inference and prediction. Springer, New York, 2001.
|
| |
9
|
|
| |
10
|
O. L. Mangasarian. Generalized support vector machines. In P. Bartlett, B. Scholkopf, D. Schuurmans, and A. Smola, editors, Advances in Large Margin Classifiers, pages 135--146. MIT Press, 2000.
|
| |
11
|
S. G. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, NY, 1996.
|
| |
12
|
C. S. Ong, A. J. Smola, and R. C. Williamson. Hyperkernels. In S. T. S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 478--485. MIT Press, Cambridge, MA, 2003.
|
| |
13
|
E. Parrado-Hernandez, J. Arenas-Garca, I. Mora-Jimenez, and A. Navia-Vazquez. On problem oriented kernel refining. Neurocomputing, 55:135--150, 2003.
|
| |
14
|
E. Parrado-Hernandez, I. Mora-Jimenez, J. Arenas-Garcia, A. R. Figueiras-Vidal, and A. Navia-Vazquez. Growing support vector classifiers with controlled complexity. Pattern Recognition, 36:1479--1488, 2003.
|
| |
15
|
|
| |
16
|
B. Scholkopf, K.Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Transactions on Signal Processing, 45(11):2758--2765, 1997.
|
| |
17
|
|
CITED BY 4
|
|
|
|
|
|
|
|
|
Jinbo Bi , Senthil Periaswamy , Kazunori Okada , Toshiro Kubota , Glenn Fung , Marcos Salganicoff , R. Bharat Rao, Computer aided detection via asymmetric cascade of sparse hyperplane classifiers, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|