ABSTRACT
In order to solve the problem of overfitting in AdaBoost, we propose a novel AdaBoost algorithm using K-means clustering. AdaBoost is known as an effective method for improving the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost is prone to overfitting in overlapped classes. In order to overcome the overfitting problem of AdaBoost, the proposed method uses K-means clustering to remove hard-to-learn samples that exist on overlapped region. Since the proposed method does not consider hard-to-learn samples, it suffers less from the overfitting problem compared to conventional AdaBoost. Both synthetic and real world data were tested to confirm the validity of the proposed method.
- Freund, Y. and Schapire, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 1 (Mar. 1997), 119--139. Google ScholarDigital Library
- Schapire, R. E. 1999. Improved boosting algorithms using confidence-rated predictions. Journal of Machine Learning, 37, 3 (Dec. 1999), 297--336. Google ScholarDigital Library
- Drucker, H. Cortes, C. Jackel, L. D. LeCun, Y. and Vapnik, V. 1994. Boosting and other ensemble methods. Journal of Neural Computation, 6, 6 (Nov. 1994), 1289--1301. Google ScholarDigital Library
- Jiang, W. 2000. Does boosting overfit: views from an exact solution. Technical Report. Department of Statistics, Northwestern University.Google Scholar
- Ratsch, G. Onoda, T. and Muller, K. R. 2001, Soft margins for AdaBoost. Journal of Machine Learning, 42, 3 (Mar. 2001), 287--320. Google ScholarDigital Library
- Dietterich, T. G. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Journal of Machine Learning, 40, 2 (Aug. 2000), 139--157. Google ScholarDigital Library
- Martinez, G. Sanchez-Martinez, A. Hernandez-Lobato, D. and Suarez, A. 2008. Class-switching neural network ensembles. Journal of Neurocomputing. 71, (Aug. 2008), 2521--2528. Google ScholarDigital Library
- Gao, Y. and Gao, F. 2010. Edited adaBoost by weighted kNN. Neurocomputing, 73, 16--18 (October, 2010), 3079--3088. Google ScholarDigital Library
- Cao, J. Kwong, S. and Wang, R. 2012. A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recognition, 45, 12 (December, 2012), 4451--4465. Google ScholarDigital Library
- Vezhnevets, A. and Barinova, O. 2007. Avoiding boosting overfitting by removing confusing samples. In Proceedings of the 18th European Conference on Machine Learning (The Warsaw, The Poland, September 17--21, 2007). 430--441. Google ScholarDigital Library
- MacQueen, J. B. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability (The California, The USA, June 21-July 18, 1965 and December 27, 1965-January 7, 1966). 281--297.Google Scholar
- Mahalanobis, P. C. 1936. On the generalized distance in statistics. In Proceedings of the National Institute of Sciences (The India, April 16, 1936). 49--55.Google Scholar
- Blake, C. L. and Merz, C. J. 1998. UCI repository of machine learning databases.Google Scholar
Index Terms
Reducing overfitting of AdaBoost by clustering-based pruning of hard examples
Recommendations
Edited AdaBoost by weighted kNN
Any realistic model of learning from samples must address the issue of noisy data. AdaBoost is known as an effective method for improving the performance of base classifiers both theoretically and empirically. However, previous studies have shown that ...
AdaBoost classifiers for pecan defect classification
Highlights The performance of AdaBoost algorithms were compared with support vector machine and Bayesian classifiers for pecan defect classification. AdaBoost classifiers took least time and gave best classification accuracy. AdaBoost classifiers ...
AdaBoost with SVM-based component classifiers
The use of SVM (Support Vector Machine) as component classifier in AdaBoost may seem like going against the grain of the Boosting principle since SVM is not an easy classifier to train. Moreover, Wickramaratna et al. [2001. Performance degradation in ...
Comments