ABSTRACT
For two-class classification, it is common to classify by setting a threshold on class probability estimates, where the threshold is determined by ROC curve analysis. An analog for multi-class classification is learning a new class partitioning of the multiclass probability simplex to minimize empirical misclassification costs. We analyze the interplay between systematic errors in the class probability estimates and cost matrices for multiclass classification. We explore the effect on the class partitioning of five different transformations of the cost matrix. Experiments on benchmark datasets with naive Bayes and quadratic discriminant analysis show the effectiveness of learning a new partition matrix compared to previously proposed methods.
- Ayer, M., Brunk, H., Ewing, G., Reid, W., & Silverman, E. (1955). An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 4, 641--647.Google ScholarCross Ref
- Deng, K., Bourke, C., Scott, S., & Vinodchandran, N. (2006). New algorithms for optimizing multiclass classifiers with ROC surfaces. Proc. of the ICML 2006 Workshop on ROC Analysis in Machine Learning. Pittsburgh, USA.Google Scholar
- Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. Proc. of 5th International Conference on Knowledge Discovery and Data Mining (pp. 155--164). San Diego, CA. Google ScholarDigital Library
- Egan, J. (1975). Signal detection theory and ROC-analysis. New York: Academic Press.Google Scholar
- Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165--175.Google ScholarCross Ref
- Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55--77. Google ScholarDigital Library
- Hanley, J., & McNeil, B. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29--36.Google ScholarCross Ref
- Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York: Springer-Verlag.Google Scholar
- Lachiche, N., & Flach, P. (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. Proc. of 20th International Conference on Machine Learning (pp. 416--423). Washington DC.Google Scholar
- Mossman, D. (1999). Three-way ROCs. Medical Decision Making, 19, 78--98.Google ScholarCross Ref
- Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. Proc. of 22nd International Conference on Machine Learning. Google ScholarDigital Library
- Noe, D. (1983). Selecting a diagnostic study's cutoff value by using its receiver operating characteristic curve. Clinical Chemistry, 29, 571--2.Google ScholarCross Ref
- O'Brien, D. B. (2006). Cost-sensitive performance of probability-estimation based classifiers: analysis and practice. Doctoral dissertation, Stanford University.Google Scholar
- Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. Advances in Large Margin Classifiers (pp. 61--74).Google Scholar
- Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42, 203--231. Google ScholarDigital Library
- Zadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naïïve Bayesian classifiers. Proc. of 18th International Conference on Machine Learning (pp. 609--616). Morgan Kaufmann Publishers, Inc. Google ScholarDigital Library
- Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. Proc. of 8th International Conference on Knowledge Discovery and Data Mining (pp. 694--699). ACM Press. Google ScholarDigital Library
Index Terms
Cost-sensitive multi-class classification from probability estimates
Recommendations
An iterative method for multi-class cost-sensitive learning
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningCost-sensitive learning addresses the issue of classification in the presence of varying costs associated with different types of misclassification. In this paper, we present a method for solving multi-class cost-sensitive learning problems using any ...
Cost-sensitive probability for weighted voting in an ensemble model for multi-class classification problems
AbstractEnsemble learning is an algorithm that utilizes various types of classification models. This algorithm can enhance the prediction efficiency of component models. However, the efficiency of combining models typically depends on the diversity and ...
Cost-Sensitive Boosting
A novel framework is proposed for the design of cost-sensitive boosting algorithms. The framework is based on the identification of two necessary conditions for optimal cost-sensitive learning that 1) expected losses must be minimized by optimal cost-...
Comments