ABSTRACT
In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. We evaluate performance on three metrics: accuracy, AUC, and squared loss and study the effect of increasing dimensionality on the performance of the learning algorithms. Our findings are consistent with previous studies for problems of relatively low dimension, but suggest that as dimensionality increases the relative performance of the learning algorithms changes. To our surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs.
- Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. MLJ, 36, 105--139. Google ScholarDigital Library
- Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. JMLR, 6, 1579--1619. Google ScholarDigital Library
- Breiman, L. (2001). Random Forests. MLJ, 45, 5--32. Google ScholarDigital Library
- Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. ICML '06, 161--168. Google ScholarDigital Library
- Freund, Y., & Schapire, R. (1999). Large Margin Classification Using the Perceptron Algorithm. MLJ, 37, 277--296. Google ScholarDigital Library
- Genkin, A., Lewis, D., & Madigan, D. (2006). Large-scale bayesian logistic regression for text categorization. Technometrics.Google Scholar
- Joachims, T. (2006). Training linear SVMs in linear time. SIGKDD, 217--226. Google ScholarDigital Library
- King, R., Feng, C., & Shutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence, 9, 259--287.Google ScholarCross Ref
- Le Cun, Y., Bottou, L., Orr, G. B., & Müüller, K.-R. (1998). Effcient backprop. In Neural networks, tricks of the trade, LNCS 1524. Springer Verlag.Google Scholar
- LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., et al. (1995). Comparison of learning algorithms for handwritten digit recognition. International Conference on Artificial Neural Networks, 60.Google Scholar
- Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. ICML '05, 625--632. Google ScholarDigital Library
- Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10.Google Scholar
- Provost, F. J., & Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. KDD '97 (pp. 43--48).Google Scholar
- Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for svm. ICML '07 (pp. 807--814). Google ScholarDigital Library
- Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. KDD '02, 694--699. Google ScholarDigital Library
Index Terms
- An empirical evaluation of supervised learning in high dimensions
Recommendations
An empirical comparison of supervised learning algorithms
ICML '06: Proceedings of the 23rd international conference on Machine learningA number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison ...
Semi-supervised learning with varifold Laplacians
This paper presents varifold learning, a learning framework based on the mathematical concept of varifolds. Different from manifold based methods, our varifold learning framework does not treat data as being sampled from a manifold; but rather, we ...
An empirical evaluation of bagging and boosting
AAAI'97/IAAI'97: Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligenceAn ensemble consists of a set of independently trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble as a whole is often more ...
Comments