| Training structural svms with kernels using sampled cuts |
| Full text |
Pdf
(335 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 794-802
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 53, Downloads (12 Months): 103, Citation Count: 0
|
|
|
ABSTRACT
Discriminative training for structured outputs has found increasing applications in areas such as natural language processing, bioinformatics, information retrieval, and computer vision. Focusing on large-margin methods, the most general (in terms of loss function and model structure) training algorithms known to date are based on cutting-plane approaches. While these algorithms are very efficient for linear models, their training complexity becomes quadratic in the number of examples when kernels are used. To overcome this bottleneck, we propose new training algorithms that use approximate cutting planes and random sampling to enable efficient training with kernels. We prove that these algorithms have improved time complexity while providing approximation guarantees. In empirical evaluations, our algorithms produced solutions with training and test error rates close to those of exact solvers. Even on binary classification problems where highly optimized conventional training methods exist (e.g. SVM-light), our methods are about an order of magnitude faster than conventional training methods on large datasets, while remaining competitive in speed on datasets of medium size.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. In NIPS, 2002.
|
| |
2
|
S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. JMLR, 2:243--264, 2001.
|
| |
3
|
V. Franc and S. Sonnenburg. Optimized cutting plane algorithm for support vector machines. In ICML, 2008.
|
| |
4
|
T. Joachims. A support vector method for multivariate performance measures. In ICML, 2005.
|
| |
5
|
T. Joachims. Training linear SVMs in linear time. In KDD, 2006.
|
| |
6
|
T. Joachims, T. Finley, and C.-N. Yu. Cutting plane training of structural SVMs. MLJ, To Appear.
|
| |
7
|
S. S. Keerthi, O. Chapelle, and D. DeCoste. Building support vector machines with reduced classifier complexity. JMLR, 7:1493--1515, 2006.
|
| |
8
|
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 12. MIT-Press, 1999.
|
| |
9
|
N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich. (Online) subgradient methods for structured prediction. In AISTATS, 2007.
|
| |
10
|
M. Saito and M. Matsumoto. SIMD-oriented fast mersenne twister: a 128-bit pseudorandom number generator. In Monte Carlo and Quasi-Monte Carlo Methods 2006, 2006.
|
| |
11
|
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML, 2007.
|
| |
12
|
A. J. Smola, S. V. N. Vishwanathan, and Q. Le. Bundle methods for machine learning. In NIPS, 2007.
|
| |
13
|
B. Taskar, C. Guestrin, and D. Koller. Maximum-Margin Markov networks. In NIPS, 2003.
|
| |
14
|
B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning. Max-margin Parsing. In EMNLP, 2004.
|
| |
15
|
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, September 2005.
|
| |
16
|
S. V. N. Vishwanathan, N. N. Schraudolph, M. W. Schmidt, and K. P. Murphy. Accelerated training of conditional random fields with stochastic gradient methods. In ICML, 2006.
|
| |
17
|
S. V. N. Vishwanathan, A. J. Smola, and M. N. Murty. SimpleSVM. In ICML, 2003.
|
| |
18
|
C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In NIPS, 2001.
|
| |
19
|
C.-N. Yu, T. Joachims, R. Elber, and J. Pillardy. Support vector training of protein alignment models. In RECOMB, 2007.
|
| |
20
|
Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In SIGIR, 2007.
|
|