research-article

An empirical evaluation of supervised learning in high dimensions

Authors:
Rich Caruana

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Nikos Karampatziakis

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Ainur Yessenalina

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

ICML '08: Proceedings of the 25th international conference on Machine learningJuly 2008Pages 96–103https://doi.org/10.1145/1390156.1390169

Published:05 July 2008Publication History

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 96–103

ABSTRACT

In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. We evaluate performance on three metrics: accuracy, AUC, and squared loss and study the effect of increasing dimensionality on the performance of the learning algorithms. Our findings are consistent with previous studies for problems of relatively low dimension, but suggest that as dimensionality increases the relative performance of the learning algorithms changes. To our surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs.

References

Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. MLJ, 36, 105--139. Google ScholarDigital Library
Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. JMLR, 6, 1579--1619. Google ScholarDigital Library
Breiman, L. (2001). Random Forests. MLJ, 45, 5--32. Google ScholarDigital Library
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. ICML '06, 161--168. Google ScholarDigital Library
Freund, Y., & Schapire, R. (1999). Large Margin Classification Using the Perceptron Algorithm. MLJ, 37, 277--296. Google ScholarDigital Library
Genkin, A., Lewis, D., & Madigan, D. (2006). Large-scale bayesian logistic regression for text categorization. Technometrics.Google Scholar
Joachims, T. (2006). Training linear SVMs in linear time. SIGKDD, 217--226. Google ScholarDigital Library
King, R., Feng, C., & Shutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9, 259--287.Google ScholarCross Ref
Le Cun, Y., Bottou, L., Orr, G. B., & Müüller, K.-R. (1998). Effcient backprop. In Neural networks, tricks of the trade, LNCS 1524. Springer Verlag.Google Scholar
LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., et al. (1995). Comparison of learning algorithms for handwritten digit recognition. International Conference on Artificial Neural Networks, 60.Google Scholar
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. ICML '05, 625--632. Google ScholarDigital Library
Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10.Google Scholar
Provost, F. J., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. KDD '97 (pp. 43--48).Google Scholar
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for svm. ICML '07 (pp. 807--814). Google ScholarDigital Library
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. KDD '02, 694--699. Google ScholarDigital Library

Index Terms

An empirical evaluation of supervised learning in high dimensions
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Model development and analysis
      1. Model verification and validation
      2. Modeling methodologies

Recommendations

An empirical comparison of supervised learning algorithms
ICML '06: Proceedings of the 23rd international conference on Machine learning

A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison ...
Read More
Semi-supervised learning with varifold Laplacians

This paper presents varifold learning, a learning framework based on the mathematical concept of varifolds. Different from manifold based methods, our varifold learning framework does not treat data as being sampled from a manifold; but rather, we ...
Read More
An empirical evaluation of bagging and boosting
AAAI'97/IAAI'97: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

An ensemble consists of a set of independently trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble as a whole is often more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 280
  Total Citations
  View Citations
- 1,625
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An empirical evaluation of supervised learning in high dimensions

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

An empirical comparison of supervised learning algorithms

Semi-supervised learning with varifold Laplacians

An empirical evaluation of bagging and boosting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An empirical evaluation of supervised learning in high dimensions

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

An empirical comparison of supervised learning algorithms

Semi-supervised learning with varifold Laplacians

An empirical evaluation of bagging and boosting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media