survey

A Survey of Predictive Modeling on Imbalanced Domains

Authors:
Paula Branco

LIAAD-INESC TEC, DCC-Faculty of Sciences, University of Porto, Porto, Portugal

LIAAD-INESC TEC, DCC-Faculty of Sciences, University of Porto, Porto, Portugal
View Profile

,
Luís Torgo

LIAAD-INESC TEC, DCC-Faculty of Sciences, University of Porto, Porto, Portugal

LIAAD-INESC TEC, DCC-Faculty of Sciences, University of Porto, Porto, Portugal
View Profile

,
Rita P. Ribeiro

LIAAD-INESC TEC, DCC-Faculty of Sciences, University of Porto, Porto, Portugal

LIAAD-INESC TEC, DCC-Faculty of Sciences, University of Porto, Porto, Portugal
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 49 Issue 2Article No.: 31pp 1–50https://doi.org/10.1145/2907070

Published:13 August 2016Publication History

ACM Computing Surveys

Abstract

Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

References

Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz. 2004. Applying support vector machines to imbalanced datasets. In Machine Learning: ECML 2004. Springer, 39--50.Google Scholar
Roberto Alejo, J. A. Antonio, Rosa Maria Valdovinos, and J. Horacio Pacheco-Sánchez. 2013. Assessments metrics for multi-class imbalance learning: A preliminary study. In Pattern Recognition. Springer, 335--343.Google Scholar
Roberto Alejo, Vicente García, and J. Horacio Pacheco-Sánchez. 2014. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neur. Process. Lett. (2014), 1--15. Google ScholarDigital Library
Roberto Alejo, Vicente García, José Martínez Sotoca, Ramón Alberto Mollineda, and José Salvador Sánchez. 2007. Improving the performance of the RBF neural networks trained with imbalanced samples. In Computational and Ambient Intelligence. Springer, 162--169. Google ScholarDigital Library
Roberto Alejo, Rosa Maria Valdovinos, Vicente García, and J. Horacio Pacheco-Sanchez. 2013. A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recogn. Lett. 34, 4 (2013), 380--388. Google ScholarDigital Library
Roberto Alejo Eleuterio, José Martínez Sotoca, Vicente García Jiménez, and Rosa María Valdovinos Rosas. 2011. Back propagation with balanced MSE cost function and nearest neighbor editing for handling class overlap and class imbalance. (2011).Google Scholar
Josh Attenberg and Seyda Ertekin. 2013. Class imbalance and active learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.Google Scholar
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In Ninth International Conference on Intelligent Systems Design and Applications, 2009. ISDA'09. IEEE, 283--287. Google ScholarDigital Library
Gaurav Bansal, Atish P. Sinha, and Huimin Zhao. 2008. Tuning data mining methods for cost-sensitive regression: A study in loan charge-off forecasting. J. Manag. Inform. Syst. 25, 3 (2008), 315--336. Google ScholarDigital Library
Ricardo Barandela, José Salvador Sánchez, Vicente Garcia, and Edgar Rangel. 2003. Strategies for learning in class imbalance problems. Pattern Recogn. 36, 3 (2003), 849--851.Google ScholarCross Ref
Vincent Barnab-Lortie, Colin Bellinger, and Nathalie Japkowicz. 2015. Active learning for one-class classification. In Proceedings of ICMLA'2015.Google ScholarCross Ref
Sukarna Barua, Monirul Islam, Xin Yao, and Kazuyuki Murase. 2012. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering (2012), 1. Google ScholarDigital Library
Guilherme Batista, Danilo Silva, and Ronaldo Prati. 2012. An experimental design to evaluate class imbalance treatment methods. In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 95--101. Google ScholarDigital Library
Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 20--29. Google ScholarDigital Library
Rukshan Batuwita and Vasile Palade. 2009. A new performance measure for class imbalance learning. Application to bioinformatics problems. In International Conference on Machine Learning and Applications, 2009. ICMLA'09. IEEE, 545--550. Google ScholarDigital Library
Rukshan Batuwita and Vasile Palade. 2010a. Efficient resampling methods for training support vector machines with imbalanced datasets. In The 2010 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarCross Ref
Rukshan Batuwita and Vasile Palade. 2010b. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18, 3 (2010), 558--571. Google ScholarDigital Library
Rukshan Batuwita and Vasile Palade. 2012. Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning. J. Bioinform. Comput. Biol. 10, 4 (2012).Google ScholarCross Ref
Colin Bellinger, Nathalie Japkowicz, and Christopher Drummond. 2015. Synthetic oversampling for advanced radioactive threat detection. In Proceedings ICML'2015.Google ScholarCross Ref
Colin Bellinger, Shiven Sharma, and Nathalie Japkowicz. 2012. One-class versus binary classification: Which and when? In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 102--106. Google ScholarDigital Library
Jinbo Bi and Kristin P. Bennett. 2003. Regression error characteristic curves. In Proc. of the 20th Int. Conf. on Machine Learning. 43--50.Google Scholar
Jerzy Błaszczyński and Jerzy Stefanowski. 2015. Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150 (2015), 529--542.Google ScholarCross Ref
Andrew P. Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 7 (1997), 1145--1159. Google ScholarDigital Library
Paula Branco. 2014. Re-sampling Approaches for Regression Tasks under Imbalanced Domains. Master's thesis. Dept. Computer Science, Faculty of Sciences, University of Porto.Google Scholar
Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and regression trees. Wadsworth & Brooks, Monterey, CA (1984).Google Scholar
Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2009. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Advances in Knowledge Discovery and Data Mining. Springer, 475--482. Google ScholarDigital Library
Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2011. MUTE: Majority under-sampling technique. In 2011 8th International Conference on Information, Communications and Signal Processing (ICICS). IEEE, 1--4.Google ScholarCross Ref
Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2012. DBSMOTE: Density-based synthetic minority over-sampling technique. Applied Intelligence 36, 3 (2012), 664--684. Google ScholarDigital Library
Chumphol Bunkhumpornpat and Sitthichoke Subpaiboonkit. 2013. Safe level graph for synthetic minority over-sampling techniques. In 2013 13th International Symposium on Communications and Information Technologies (ISCIT). IEEE, 570--575.Google ScholarCross Ref
Michael Cain and Christian Janssen. 1995. Real estate price prediction under asymmetric loss. Ann. Inst. Stat. Math. 47, 3 (1995), 401--414.Google Scholar
Peng Cao, Dazhe Zhao, and Osmar R. Zaïane. 2013. A PSO-based cost-sensitive neural network for imbalanced data classification. In Trends and Applications in Knowledge Discovery and Data Mining. Springer, 452--463. Google ScholarDigital Library
Cristiano Leite Castro and Antônio de Pádua Braga. 2013. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neur. Netw. Learn. Syst. 24, 6 (2013), 888--899.Google ScholarCross Ref
Edward Y. Chang, Beitao Li, Gang Wu, and Kingshy Goh. 2003. Statistical learning for effective visual information retrieval. In ICIP (3). 609--612.Google Scholar
Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015a. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163 (2015), 3--16. Google ScholarDigital Library
Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015b. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89 (2015), 385--397. Google ScholarDigital Library
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. JAIR 16 (2002), 321--357. Google ScholarCross Ref
Nitesh V. Chawla, David A. Cieslak, Lawrence O. Hall, and Ajay Joshi. 2008. Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 17, 2 (2008), 225--252. Google ScholarDigital Library
Nitesh V. Chawla, Lawrence O. Hall, and Ajay Joshi. 2005. Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM, New York, NY, 24--33. Google ScholarDigital Library
Nitesh V. Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 1--6. Google ScholarDigital Library
Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Knowledge Discovery in Databases: PKDD 2003. Springer, 107--119.Google Scholar
Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley (2004).Google Scholar
Sheng Chen, Haibo He, and Edwardo A. Garcia. 2010. Ramoboost: Ranked minority oversampling in boosting. IEEE Trans. Neural Networks 21, 10 (2010), 1624--1642. Google ScholarDigital Library
Xue-wen Chen and Michael Wasikowski. 2008. Fast: A roc-based feature selection metric for small samples and imbalanced data classification problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 124--132. Google ScholarDigital Library
Peter F. Christoffersen and Francis X. Diebold. 1996. Further results on forecasting and model selection under asymmetric loss. J. Appl. Econom. 11, 5 (1996), 561--571.Google ScholarCross Ref
Peter F. Christoffersen and Francis X. Diebold. 1997. Optimal prediction under asymmetric loss. Econom. Theor. 13, 6 (1997), 808--817.Google ScholarCross Ref
Leilei Chu, Hui Gao, and Wenbo Chang. 2010. A new feature weighting method based on probability distribution in imbalanced text classification. In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Vol. 5. IEEE, 2335--2339.Google ScholarCross Ref
Yu-Meei Chyi. 2003. Classification analysis techniques for skewed class distribution problems. Master Thesis, Department of Information Management, National Sun Yat-Sen University (2003).Google Scholar
David A. Cieslak and Nitesh V. Chawla. 2008. Learning decision trees for unbalanced data. In Machine Learning and Knowledge Discovery in Databases. Springer, 241--256.Google Scholar
David A. Cieslak, Thomas R. Hoens, Nitesh V. Chawla, and W. Philip Kegelmeyer. 2012. Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 24, 1 (2012), 136--158. Google ScholarDigital Library
Gilles Cohen, Mélanie Hilario, Hugo Sax, Stéphane Hugonnet, and Antoine Geissbuhler. 2006. Learning from imbalanced data in surveillance of nosocomial infection. Artif. Intell. Med. 37, 1 (2006), 7--18. Google ScholarDigital Library
Sven F. Crone, Stefan Lessmann, and Robert Stahlbock. 2005. Utility based data mining for time series analysis: Cost-sensitive learning for neural network predictors. In Proceedings of the 1st International Workshop on Utility-based Data Mining. ACM, New York, NY, 59--68. Google ScholarDigital Library
Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi. 2015. When is undersampling effective in unbalanced classification tasks? In Machine Learning and Knowledge Discovery in Databases. Springer, 200--215.Google Scholar
Sophia Daskalaki, Ioannis Kopanas, and Nikolaos M. Avouris. 2006. Evaluation of classifiers for an uneven class distribution problem. Appl. Artif. Intell. 20, 5 (2006), 381--417.Google ScholarCross Ref
Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. In ICML'06: Proc. of the 23rd Int. Conf. on Machine Learning (ACM ICPS). ACM, New York, NY, 233--240. Google ScholarDigital Library
María Dolores Del Castillo and José Ignacio Serrano. 2004. A multistrategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 70--79. Google ScholarDigital Library
Misha Denil and Thomas Trappenberg. 2010. Overlap versus imbalance. In Advances in Artificial Intelligence. Springer, 220--231. Google ScholarDigital Library
Pedro Domingos. 1999. MetaCost: A general method for making classifiers cost-sensitive. In KDD'99: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 155--164. Google ScholarDigital Library
John Doucette and Malcolm I. Heywood. 2008. GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In Genetic Programming. Springer, 266--277. Google ScholarDigital Library
Dennis J. Drown, Taghi M. Khoshgoftaar, and Naeem Seliya. 2009. Evolutionary sampling and software quality modeling of high-assurance systems. IEEE Trans. Syst. Man Cybernet. A 39, 5 (2009), 1097--1107. Google ScholarDigital Library
Chris Drummond and Robert C. Holte. 2000. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 198--207. Google ScholarDigital Library
Chris Drummond and Robert C. Holte. 2003. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II, Vol. 11. Citeseer.Google Scholar
James P. Egan. 1975. Signal detection theory and {ROC} analysis. (1975).Google Scholar
Charles Elkan. 2001. The foundations of cost-sensitive learning. In IJCAI'01: Proc. of 17th Int. Joint Conf. of Artificial Intelligence, Vol. 1. Morgan Kaufmann Publishers, 973--978. Google ScholarDigital Library
Şeyda Ertekin. 2013. Adaptive oversampling for imbalanced data classification. In Information Sciences and Systems 2013. Springer, 261--269.Google ScholarCross Ref
Şeyda Ertekin, Jian Huang, Leon Bottou, and Lee Giles. 2007b. Learning on the border: Active learning in imbalanced data classification. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. ACM, New York, NY, 127--136. Google ScholarDigital Library
Şeyda Ertekin, Jian Huang, and C. Lee Giles. 2007a. Active learning for class imbalance problem. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 823--824. Google ScholarDigital Library
Andrew Estabrooks and Nathalie Japkowicz. 2001. A mixture-of-experts framework for learning from imbalanced data sets. In Advances in Intelligent Data Analysis. Springer, 34--43. Google ScholarDigital Library
Andrew Estabrooks, Taeho Jo, and Nathalie Japkowicz. 2004. A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20, 1 (2004), 18--36.Google ScholarCross Ref
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 8 (2006), 861--874. Google ScholarDigital Library
Alberto Fernández, María José del Jesus, and Francisco Herrera. 2010. On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inform. Sci. 180, 8 (2010), 1268--1291. Google ScholarDigital Library
Alberto Fernández, Salvador García, María José del Jesus, and Francisco Herrera. 2008. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159, 18 (2008), 2378--2398. Google ScholarDigital Library
Antonio Fernández-Baldera, José M. Buenaposada, and Luis Baumela. 2015. Multi-class boosting for imbalanced data. In Pattern Recognition and Image Analysis. Springer, 57--64.Google Scholar
César Ferri, Peter Flach, José Hernández-Orallo, and Athmane Senad. 2005. Modifying ROC curves to incorporate predicted probabilities. In Proceedings of the Second Workshop on ROC Analysis in Machine Learning. 33--40.Google Scholar
César Ferri, José Hernández-orallo, and Peter A. Flach. 2011a. Brier curves: A new cost-based visualisation of classifier performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 585--592.Google Scholar
César Ferri, José Hernández-Orallo, and Peter A. Flach. 2011b. A coherent interpretation of AUC as a measure of aggregated classification performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 657--664.Google Scholar
César Ferri, José Hernández-Orallo, and R. Modroiu. 2009. An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 1 (2009), 27--38. Google ScholarDigital Library
George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (2003), 1289--1305. Google ScholarCross Ref
George Forman and Ira Cohen. 2004. Learning from little: Comparison of classifiers given little training. In Knowledge Discovery in Databases: PKDD 2004. Springer, 161--172. Google ScholarDigital Library
Mikel Galar, Alberto Fernández, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera. 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. C 42, 4 (2012), 463--484. Google ScholarDigital Library
Mikel Galar, Alberto Fernández, Edurne Barrenechea, and Francisco Herrera. 2013. Eusboost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. (2013). Google ScholarDigital Library
Ming Gao, Xia Hong, Sheng Chen, Chris J. Harris, and Emad Khalaf. 2014. PDFOS: PDF estimation based over-sampling for imbalanced two-class problems. Neurocomputing 138 (2014), 248--259.Google ScholarCross Ref
Joaquín García, Salvador Derrac, Isaac Triguero, Cristobal J. Carmona, and Francisco Herrera. 2012. Evolutionary-based selection of generalized instances for imbalanced classification. Knowl.-Based Syst. 25, 1 (2012), 3--12. Google ScholarDigital Library
Salvador García, José Ramón Cano, Alberto Fernández, and Francisco Herrera. 2006. A proposal of evolutionary prototype selection for class imbalance problems. In Intelligent Data Engineering and Automated Learning--IDEAL 2006. Springer, 1415--1423. Google ScholarDigital Library
Salvador García and Francisco Herrera. 2009. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17, 3 (2009), 275--306. Google ScholarDigital Library
Vicente García, Roberto Alejo, José Salvador Sánchez, José Martínez Sotoca, and Ramón Alberto Mollineda. 2006. Combined effects of class imbalance and class overlap on instance-based classification. In Intelligent Data Engineering and Automated Learning--IDEAL 2006. Springer, 371--378. Google ScholarDigital Library
Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2008. A new performance evaluation method for two-class imbalanced problems. In Structural, Syntactic, and Statistical Pattern Recognition. Springer, 917--925. Google ScholarDigital Library
Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2009. Index of balanced accuracy: A performance measure for skewed class distributions. In Pattern Recognition and Image Analysis. Springer, 441--448. Google ScholarDigital Library
Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2010. Theoretical analysis of a performance measure for imbalanced data. In 2010 20th International Conference on Pattern Recognition (ICPR). IEEE, 617--620. Google ScholarDigital Library
Alireza Ghasemi, Mohammad T. Manzuri, Hamid R. Rabiee, Mohammad H. Rohban, and Siavash Haghiri. 2011a. Active one-class learning by kernel density estimation. In 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarCross Ref
Alireza Ghasemi, Hamid R. Rabiee, Mohsen Fadaee, Mohammad T. Manzuri, and Mohammad H. Rohban. 2011b. Active learning from positive and unlabeled data. In 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW). IEEE, 244--250. Google ScholarDigital Library
Adel Ghazikhani, Reza Monsefi, and Hadi Sadoghi Yazdi. 2014. Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybernet. 5, 1 (2014), 51--62.Google ScholarCross Ref
Clive W. Granger. 1999. Outline of forecast theory using generalized cost functions. Span. Econ. Rev. 1, 2 (1999), 161--173.Google ScholarCross Ref
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing. Springer, 878--887. Google ScholarDigital Library
David J. Hand. 2009. Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learn. 77, 1 (2009), 103--123. Google ScholarDigital Library
Peter. E. Hart. 1968. The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14 (1968), 515--516. Google ScholarDigital Library
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, 1322--1328.Google Scholar
Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Knowl. Data Eng. 21, 9 (2009), 1263--1284. Google ScholarDigital Library
Haibo He and Yunqian Ma. 2013. Imbalanced Learning: Foundations, Algorithms, and Applications. John Wiley & Sons. Google ScholarCross Ref
José Hernández-Orallo. 2012. Soft (gaussian CDE) regression models and loss functions. arXiv Preprint arXiv:1211.1043 (2012).Google Scholar
José Hernández-Orallo. 2013. {ROC} curves for regression. Pattern Recogn. 46, 12 (2013), 3395--3411. DOI:http://dx.doi.org/10.1016/j.patcog.2013.06.014 Google ScholarDigital Library
José Hernández-Orallo. 2014. Probabilistic reframing for cost-sensitive regression. ACM Trans. Knowl. Discov. Data 8, 4, Article 17 (Aug. 2014), 55 pages. DOI:http://dx.doi.org/10.1145/2641758 Google ScholarDigital Library
José Hernández-Orallo, Peter Flach, and César Ferri. 2012. A unified view of performance metrics: Translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 1 (2012), 2813--2869. Google ScholarDigital Library
Robert C. Holte, Liane E. Acker, and Bruce W. Porter. 1989. Concept learning and the problem of small disjuncts. In IJCAI, Vol. 89. Citeseer, 813--818. Google ScholarDigital Library
Junjie Hu. 2012. Active learning for imbalance problem using L-GEM of RBFNN. In ICMLC. 490--495.Google Scholar
Shengguo Hu, Yanfeng Liang, Lintao Ma, and Ying He. 2009. MSMOTE: Improving classification performance when training data is imbalanced. In Second International Workshop on Computer Science and Engineering, 2009. WCSE'09, Vol. 2. IEEE, 13--17. Google ScholarDigital Library
Kaizhu Huang, Haiqin Yang, Irwin King, and Michael R. Lyu. 2004. Learning classifiers from imbalanced data based on biased minimax probability machine. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 2. IEEE, II--558. Google ScholarDigital Library
Jae Pil Hwang, Seongkeun Park, and Euntai Kim. 2011. A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst. Appl. 38, 7 (2011), 8580--8585. Google ScholarDigital Library
Tasadduq Imam, Kai Ming Ting, and Joarder Kamruzzaman. 2006. z-SVM: An SVM for improved classification of imbalanced data. In AI 2006: Advances in Artificial Intelligence. Springer, 264--273. Google ScholarDigital Library
Nathalie Japkowicz. 2000. Learning from imbalanced data sets: A comparison of various strategies. In AAAI Workshop on Learning from Imbalanced Data Sets, Vol. 68. Menlo Park, CA.Google Scholar
Nathalie Japkowicz. 2001. Concept-learning in the presence of between-class and within-class imbalances. In Advances in Artificial Intelligence. Springer, 67--77. Google ScholarDigital Library
Nathalie Japkowicz. 2003. Class imbalances: Are we focusing on the right issue. In Workshop on Learning from Imbalanced Data Sets II, Vol. 1723. 63.Google Scholar
Natalie Japkowicz. 2013. Assessment metrics for imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.Google Scholar
Nathalie Japkowicz, Catherine Myers, and Mark Gluck. 1995. A novelty detection approach to classification. In IJCAI. 518--523. Google ScholarDigital Library
Nathalie Japkowicz and Mohak Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press. Google Scholar
Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429--449. Google ScholarCross Ref
Piyasak Jeatrakul, Kok Wai Wong, and Chun Che Fung. 2010. Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In Neural Information Processing. Models and Applications. Springer, 152--159. Google ScholarDigital Library
Taeho Jo and Nathalie Japkowicz. 2004. Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 40--49. Google ScholarDigital Library
Mahesh V. Joshi, Vipin Kumar, and Ramesh C. Agarwal. 2001. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001. IEEE, 257--264. Google ScholarDigital Library
Pilsung Kang and Sungzoon Cho. 2006. EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. In Neural Information Processing. Springer, 837--846. Google ScholarDigital Library
Taghi M. Khoshgoftaar, Chris Seiffert, Jason Van Hulse, Amri Napolitano, and Andres Folleco. 2007. Learning with limited minority class data. In Sixth International Conference on Machine Learning and Applications, 2007. ICMLA 2007. IEEE, 348--353. Google ScholarDigital Library
Sotiris Kotsiantis, Dimitris Kanellopoulos, and Panayiotis Pintelas. 2006. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng. 30, 1 (2006), 25--36.Google Scholar
Sotiris Kotsiantis and Panagiotis Pintelas. 2003. Mixture of expert agents for handling imbalanced data sets. Ann. Math. Comput. Teleinform. 1, 1 (2003), 46--55.Google Scholar
Miroslav Kubat, Robert C. Holte, and Stan Matwin. 1998. Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30, 2--3 (1998), 195--215. Google ScholarDigital Library
Miroslav Kubat and Stan Matwin. 1997. Addressing the curse of imbalanced training sets: One-sided selection. In Proc. of the 14th Int. Conf. on Machine Learning. Morgan Kaufmann, 179--186.Google Scholar
Jorma Laurikkala. 2001. Improving Identification of Difficult Small Classes by Balancing Class Distribution. Springer.Google Scholar
Hyoung-joo Lee and Sungzoon Cho. 2006. The novelty detection approach for different degrees of class imbalance. In Neural Information Processing. Springer, 21--30. Google ScholarDigital Library
Sauchi Stephen Lee. 1999. Regularization in skewed binary classification. Comput. Stat. 14, 2 (1999), 277.Google ScholarCross Ref
Sauchi Stephen Lee. 2000. Noisy replication in skewed binary classification. Comput. Stat. Data Anal. 34, 2 (2000), 165--191. Google ScholarDigital Library
Tae-Hwy Lee. 2008. Loss functions in time series forecasting. International Encyclopedia of the Social Sciences (2008).Google Scholar
Chen Li, Chen Jing, and Gao Xin-tao. 2009. An improved P-SVM method used to deal with imbalanced data sets. In IEEE International Conference on Intelligent Computing and Intelligent Systems, 2009. ICIS 2009, Vol. 1. IEEE, 118--122.Google Scholar
Kewen Li, Wenrong Zhang, Qinghua Lu, and Xianghua Fang. 2014. An improved SMOTE imbalanced data classification method based on support degree. In 2014 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI). IEEE, 34--38. Google ScholarDigital Library
Peng Li, Pei-Li Qiao, and Yuan-Chao Liu. 2008. A hybrid re-sampling method for SVM learning from imbalanced data sets. In Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008. FSKD'08. Vol. 2. IEEE, 65--69. Google ScholarDigital Library
M. Lichman. 2013. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.Google Scholar
Chun-Fu Lin and Sheng-De Wang. 2002. Fuzzy support vector machines. IEEE Trans. Neur. Network. 13, 2 (2002), 464--471. Google ScholarDigital Library
Alexander Liu, Joydeep Ghosh, and Cheryl E. Martin. 2007. Generative oversampling for mining imbalanced datasets. In DMIN. 66--72.Google Scholar
Wei Liu, Sanjay Chawla, David A. Cieslak, and Nitesh V. Chawla. 2010. A robust decision tree algorithm for imbalanced data sets. In SDM, Vol. 10. SIAM, 766--777.Google Scholar
Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybernet. B 39, 2 (2009), 539--550. Google ScholarDigital Library
Yang Liu, Aijun An, and Xiangji Huang. 2006. Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In Advances in Knowledge Discovery and Data Mining. Springer, 107--118. Google ScholarDigital Library
Victoria López, Alberto Fernández, Salvador García, Vasile Palade, and Francisco Herrera. 2013. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250 (2013), 113--141.Google ScholarCross Ref
Victoria López, Alberto Fernández, and Francisco Herrera. 2014. On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Inform. Sci. 257 (2014), 1--13. Google ScholarDigital Library
José María Luna, Cristóbal Romero, José Raúl Romero, and Sebastián Ventura. 2015. An evolutionary algorithm for the discovery of rare class association rules in learning management systems. Appl. Intell. 42, 3 (2015), 501--513. Google ScholarDigital Library
Tomasz Maciejewski and Jerzy Stefanowski. 2011. Local neighbourhood extension of SMOTE for mining imbalanced data. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 104--111.Google ScholarCross Ref
Satyam Maheshwari, Jitendra Agrawal, and Sanjeev Sharma. 2011. A new approach for classification of highly imbalanced datasets using evolutionary algorithms. Intl. J. Sci. Eng. Res 2 (2011), 1--5.Google Scholar
Marcus A. Maloof. 2003. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, Vol. 2. 2--1.Google Scholar
Larry Manevitz and Malik Yousef. 2002. One-class SVMs for document classification. J. Mach. Learn. Res. 2 (2002), 139--154. Google ScholarDigital Library
Olvi L. Mangasarian and Edward W. Wild. 2001. Proximal support vector machine classifiers. In Proceedings KDD-2001: Knowledge Discovery and Data Mining. Citeseer. Google ScholarDigital Library
Veenu Mangat and Renu Vig. 2014. Intelligent rule mining algorithm for classification over imbalanced data. J. Emerg. Technol. Web Intell. 6, 3 (2014), 373--379.Google Scholar
Inderjeet Mani and Jianping Zhang. 2003. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets.Google Scholar
José Manuel Martínez-García, Carmen Paz Suárez-Araujo, and Patricio García Báez. 2012. SNEOM: A sanger network based extended over-sampling method. application to imbalanced biomedical datasets. In Neural Information Processing. Springer, 584--592. Google ScholarDigital Library
David Mease, Abraham Wyner, and Andreas Buja. 2007. Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Mach. Learn. Res. 8 (2007), 409--439.Google ScholarDigital Library
Giovanna Menardi and Nicola Torelli. 2010. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. (2010), 1--31. Google ScholarDigital Library
Charles E. Metz. 1978. Basic principles of ROC analysis. In Seminars in Nuclear Medicine, Vol. 8. Elsevier, 283--298.Google Scholar
Ying Mi. 2013. Imbalanced classification based on active learning SMOTE. Res. J. Appl. Sci. 5 (2013).Google Scholar
Dunja Mladenic and Marko Grobelnik. 1999. Feature selection for unbalanced class distribution and naive bayes. In ICML, Vol. 99. 258--267. Google ScholarDigital Library
Jose G. Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recogn. 45, 1 (2012), 521--530. Google ScholarDigital Library
Douglas Mossman. 1999. Three-way rocs. Med. Dec. Mak. 19, 1 (1999), 78--89.Google Scholar
Satuluri Naganjaneyulu and Mrithyumjaya Rao Kuppa. 2013. A novel framework for class imbalance learning using intelligent under-sampling. Progr. Artif. Intell. 2, 1 (2013), 73--84.Google ScholarCross Ref
Munehiro Nakamura, Yusuke Kajiwara, Atsushi Otsuka, and Haruhiko Kimura. 2013. LVQ-SMOTE--learning vector quantization based synthetic minority over--sampling technique for biomedical data. BioData Min. 6, 1 (2013), 16.Google ScholarCross Ref
Krystyna Napierała, Jerzy Stefanowski, and Szymon Wilk. 2010. Learning from imbalanced data in presence of noisy and borderline examples. In Rough Sets and Current Trends in Computing. Springer, 158--167. Google ScholarDigital Library
Wing WY Ng, Jiankun Hu, Daniel S. Yeung, Sha Yin, and Fabio Roli. 2014. Diversified sensitivity-based undersampling for imbalance classification problems. (2014).Google Scholar
Sang-Hoon Oh. 2011. Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74, 6 (2011), 1058--1061. Google ScholarDigital Library
Ronald Pearson, Gregory Goney, and James Shwaber. 2003. Imbalanced clustering for microarray time-series. In Proceedings of the ICML, Vol. 3.Google Scholar
María Pérez-Ortiz, Pedro Antonio Gutiérrez, and César Hervás-Martínez. 2014. Projection-based ensemble learning for ordinal regression. IEEE Trans. Cybernet. 44, 5 (2014), 681--694.Google ScholarCross Ref
Clifton Phua, Damminda Alahakoon, and Vincent Lee. 2004. Minority report in fraud detection: Classification of skewed data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 50--59. Google ScholarDigital Library
Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria Carolina Monard. 2004a. Class imbalances versus class overlapping: An analysis of a learning system behavior. In MICAI 2004: Advances in Artificial Intelligence. Springer, 312--321.Google ScholarCross Ref
Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria Carolina Monard. 2004b. Learning with class skews and small disjuncts. In Advances in Artificial Intelligence--SBIA 2004. Springer, 296--306.Google ScholarCross Ref
Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Diego F. Silva. 2014. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inform. Syst. (2014), 1--24. Google ScholarDigital Library
Foster J. Provost and Tom Fawcett. 1997. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD, Vol. 97. 43--48.Google ScholarDigital Library
Foster J Provost, Tom Fawcett, and Ron Kohavi. 1998. The case against accuracy estimation for comparing induction algorithms. In ICML'98: Proc. of the 15th Int. Conf. on Machine Learning. Morgan Kaufmann Publishers, 445--453. Google ScholarDigital Library
Troy Raeder, George Forman, and Nitesh V. Chawla. 2012. Learning from imbalanced data: Evaluation matters. In Data Mining: Foundations and Intelligent Paradigms. Springer, 315--331.Google Scholar
Enislay Ramentol, Yailé Caballero, Rafael Bello, and Francisco Herrera. 2012a. SMOTE-RSB&ast;: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inform. Syst. 33, 2 (2012), 245--265.Google ScholarDigital Library
Enislay Ramentol, Nelle Verbiest, Rafael Bello, Yailé Caballero, Chris Cornelis, and Francisco Herrera. 2012b. SMOTE-FRST: A new resampling method using fuzzy rough set theory. In 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to Appear).Google ScholarCross Ref
Romesh Ranawana and Vasile Palade. 2006. Optimized precision-a new measure for classifier performance evaluation. In IEEE Congress on Evolutionary Computation, 2006. CEC 2006. IEEE, 2254--2261.Google ScholarCross Ref
Bhavani Raskutti and Adam Kowalczyk. 2004. Extreme re-balancing for SVMs: A case study. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 60--69. Google ScholarDigital Library
Rita P. Ribeiro. 2011. Utility-based Regression. Ph.D. Dissertation. Dep. Computer Science, Faculty of Sciences, University of Porto.Google Scholar
Rita P. Ribeiro and Luís Torgo. 2003. Predicting harmful algae blooms. In Progress in Artificial Intelligence. Springer, 308--312.Google Scholar
Cornelis V. Rijsbergen. 1979. Information Retrieval. Dept. of Computer Science, University of Glasgow, 2nd edition. (1979). Google ScholarDigital Library
Juan J. Rodríguez, José-Francisco Díez-Pastor, Jesús Maudes, and César García-Osorio. 2012. Disturbing neighbors ensembles of trees for imbalanced data. In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 83--88. Google ScholarDigital Library
José A. Sáez, Julián Luengo, Jerzy Stefanowski, and Francisco Herrera. 2015. SMOTE--IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inform. Sci. 291 (2015), 184--203. Google ScholarDigital Library
Juan Pablo Sánchez-Crisostomo, Roberto Alejo, Erika López-González, Rosa María Valdovinos, and J. Horacio Pacheco-Sánchez. 2014. Empirical analysis of assessments metrics for multi-class imbalance learning on the back-propagation context. In Advances in Swarm Intelligence. Springer, 17--23.Google Scholar
Javier Sánchez-Monedero, Pedro Antonio Gutiérrez, and Cesar Hervás-Martínez. 2013. Evolutionary ordinal extreme learning machine. In Hybrid Artificial Intelligent Systems. Springer, 500--509.Google Scholar
Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neur. Comput. 13, 7 (2001), 1443--1471. Google ScholarDigital Library
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Andres Folleco. 2011. An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inform. Sci. (2011). Google ScholarDigital Library
Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2010. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans.Syst. Man Cybernet. A 40, 1 (2010), 185--197. Google ScholarDigital Library
Shiven Sharma, Colin Bellinger, and Nathalie Japkowicz. 2012. Clustering based one-class classification for compliance verification of the comprehensive nuclear-test-ban treaty. In Advances in Artificial Intelligence. Springer, 181--193. Google ScholarDigital Library
Atish P. Sinha and Jerrold H. May. 2004. Evaluating and tuning predictive data mining models using receiver operating characteristic curves. J. Manag. Inform. Syst. 21, 3 (2004), 249--280. Google ScholarDigital Library
Parinaz Sobhani, Herna Viktor, and Stan Matwin. 2014. Learning from imbalanced data using ensemble methods and cluster-based undersampling. In New Frontiers in Mining Complex Patterns. Springer, 69--83.Google Scholar
Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inform. Process. Manag. 45, 4 (2009), 427--437. Google ScholarDigital Library
Jie Song, Xiaoling Lu, and Xizhi Wu. 2009. An improved AdaBoost algorithm for unbalanced classification data. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Vol. 1. IEEE, 109--113. Google ScholarDigital Library
Panote Songwattanasiri and Krung Sinapiromsaran. 2010. SMOUTE: Synthetics minority over-sampling and under-sampling techniques for class imbalanced problem. In Proceedings of the Annual International Conference on Computer Science Education: Innovation and Technology, Special Track: Knowledge Discovery. 78--83.Google Scholar
Jerzy Stefanowski. 2016. Dealing with data difficulty factors while learning from imbalanced data. In Challenges in Computational Statistics and Data Mining. Springer, 333--363.Google Scholar
Jerzy Stefanowski and Szymon Wilk. 2008. Selective pre-processing of imbalanced data for improving classification performance. In Data Warehousing and Knowledge Discovery. Springer, 283--292. Google ScholarDigital Library
Yanmin Sun, Mohamed S. Kamel, and Yang Wang. 2006. Boosting for learning multiple classes with imbalanced class distribution. In Sixth International Conference on Data Mining, 2006. ICDM'06. IEEE, 592--602. Google ScholarDigital Library
Yanmin Sun, Mohamed S. Kamel, Andrew K. C. Wong, and Yang Wang. 2007. Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 12 (2007), 3358--3378. Google ScholarDigital Library
Yanmin Sun, Andrew K. C. Wong, and Mohamed S. Kamel. 2009. Classification of imbalanced data: A review. Int. J. Pattern Recogn. Artif. Intell. 23, 4 (2009), 687--719.Google ScholarCross Ref
Muhammad Atif Tahir, Josef Kittler, and Fei Yan. 2012. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45, 10 (2012), 3738--3750. Google ScholarDigital Library
Aik Tan, David Gilbert, and Yves Deville. 2003. Multi-class protein fold classification using a new ensemble machine learning approach. (2003).Google Scholar
Yuchun Tang and Yan-Qing Zhang. 2006. Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In 2006 IEEE International Conference on Granular Computing. IEEE, 457--460.Google ScholarCross Ref
Yuchun Tang, Yan-Qing Zhang, Nitesh V. Chawla, and Sven Krasser. 2009. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybernet. B 39, 1 (2009), 281--288. Google ScholarDigital Library
Dacheng Tao, Xiaoou Tang, Xuelong Li, and Xindong Wu. 2006. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7 (2006), 1088--1099. Google ScholarDigital Library
Nguyen Thai-Nghe, Zeno Gantner, and Lars Schmidt-Thieme. 2011. A new evaluation measure for learning from imbalanced data. In The 2011 International Joint Conference on Neural Networks (IJCNN). IEEE, 537--542.Google ScholarCross Ref
Ivan Tomek. 1976. Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 11 (1976), 769--772.Google Scholar
Luís Torgo. 2005. Regression error characteristic surfaces. In KDD'05: Proc. of the 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, 697--702. Google ScholarDigital Library
Luís Torgo and Rita P. Ribeiro. 2003. Predicting outliers. In Knowledge Discovery in Databases: PKDD 2003. Springer, 447--458.Google Scholar
Luís Torgo and Rita P. Ribeiro. 2007. Utility-based regression. In PKDD'07: Proc. of 11th European Conf. on Principles and Practice of Knowledge Discovery in Databases. Springer, 597--604.Google Scholar
Luís Torgo and Rita P. Ribeiro. 2009. Precision and recall in regression. In DS'09: 12th Int. Conf. on Discovery Science. Springer, 332--346. Google ScholarDigital Library
Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, and Paula Branco. 2013. SMOTE for regression. In Progress in Artificial Intelligence. Springer, 378--389.Google Scholar
Peter Van Der Putten and Maarten Van Someren. 2004. A bias-variance analysis of a real-world learning problem: The coil challenge 2000. Mach. Learn. 57, 1--2 (2004), 177--195. Google ScholarDigital Library
Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napolitano. 2007. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th International Conference on Machine Learning. ACM, 935--942. Google ScholarDigital Library
Madireddi Vasu and Vadlamani Ravi. 2011. A hybrid under-sampling approach for mining unbalanced datasets: Applications to banking and insurance. Int. J. Data Min. Model. Manag. 3, 1 (2011), 75--105.Google Scholar
Nele Verbiest, Enislay Ramentol, Chris Cornelis, and Francisco Herrera. 2012. Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In Advances in Artificial Intelligence--IBERAMIA 2012. Springer, 169--178.Google ScholarCross Ref
Konstantinos Veropoulos, Colin Campbell, and Nello Cristianini. 1999. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 1999. Citeseer, 55--60.Google Scholar
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11 (2010), 3371--3408. Google ScholarDigital Library
Kiri L. Wagstaff, Nina L. Lanza, David R. Thompson, Thomas G. Dietterich, and Martha S. Gilmore. 2013. Guiding scientific discovery with explanations using DEMUD. In AAAI. Google ScholarDigital Library
Byron C. Wallace and Issa J. Dahabreh. 2012. Class probability estimates are unreliable for imbalanced data (and how to fix them). In 2012 IEEE 12th International Conference on Data Mining (ICDM). IEEE, 695--704. Google ScholarDigital Library
Byron C. Wallace and Issa J. Dahabreh. 2014. Improving class probability estimates for imbalanced data. Knowl. Inform. Syst. 41, 1 (2014), 33--52. Google ScholarDigital Library
Byron C. Wallace, Kevin Small, Carla E. Brodley, and Thomas A. Trikalinos. 2011. Class imbalance, redux. In 2011 IEEE 11th International Conference on Data Mining (ICDM). IEEE, 754--763. Google ScholarDigital Library
Benjamin X. Wang and Nathalie Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowl. Inform. Syst. 25, 1 (2010), 1--20. Google ScholarDigital Library
Heng Wang and Zubin Abraham. 2015. Concept drift detection for imbalanced stream data. arXiv Preprint arXiv:1504.01044 (2015).Google Scholar
He-Yong Wang. 2008. Combination approach of SMOTE and biased-SVM for imbalanced datasets. In IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, 228--231.Google ScholarCross Ref
Shuo Wang and Xin Yao. 2009. Diversity analysis on imbalanced data sets by using ensemble models. In IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE, 324--331.Google ScholarCross Ref
Xiaoguang Wang, Xuan Liu, Nathalie Japkowicz, and Stan Matwin. 2013a. Resampling and cost-sensitive methods for imbalanced multi-instance learning. In 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW). IEEE, 808--816.Google ScholarCross Ref
Xiaoguang Wang, Stan Matwin, Nathalie Japkowicz, and Xuan Liu. 2013b. Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In Advances in Artificial Intelligence. Springer, 174--186.Google Scholar
Mike Wasikowski and Xue-wen Chen. 2010. Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1388--1400. Google ScholarDigital Library
Deng Weiguo, Wang Li, Wang Yiyang, and Qian Zhong. 2012. An improved SVM-KM model for imbalanced datasets. In 2012 International Conference on Industrial Control and Electronics Engineering (ICICEE). IEEE, 100--103. Google ScholarDigital Library
Gary M. Weiss. 2004. Mining with rarity: A unifying framework. SIGKDD Explor. Newslett. 6, 1 (2004), 7--19. Google ScholarDigital Library
Gary M. Weiss. 2005. Mining with rare cases. In Data Mining and Knowledge Discovery Handbook. Springer, 765--776.Google Scholar
Gary M. Weiss. 2010. The impact of small disjuncts on classifier learning. In Data Mining. Springer, 193--226.Google Scholar
Gary M. Weiss. 2013. Foundations of imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.Google Scholar
Gary M. Weiss and Foster J. Provost. 2003. Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Intell. Res.(JAIR) 19 (2003), 315--354. Google ScholarDigital Library
Cheng G. Weng and Josiah Poon. 2008. A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian Data Mining Conference-Volume 87. Australian Computer Society, Inc., 27--32. Google ScholarDigital Library
Gang Wu and Edward Y. Chang. 2003. Class-boundary alignment for imbalanced dataset learning. In ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC. 49--56.Google Scholar
Gang Wu and Edward Y. Chang. 2005. KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17, 6 (2005), 786--795. Google ScholarDigital Library
Shaomin Wu, Peter Flach, and César Ferri. 2007. An improved model selection heuristic for AUC. In ECML. Springer, 478--489. Google ScholarDigital Library
Jin Xiao, Ling Xie, Changzheng He, and Xiaoyi Jiang. 2012. Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst. Appl. 39, 3 (2012), 3668--3675. Google ScholarDigital Library
Li Xuan, Chen Zhigang, and Yang Fan. 2013. Exploring of clustering algorithm on class-imbalanced data. In 2013 8th International Conference on Computer Science & Education (ICCSE). IEEE, 89--93.Google ScholarCross Ref
Zeping Yang and Daqi Gao. 2012. An active under-sampling approach for imbalanced data classification. In 2012 Fifth International Symposium on Computational Intelligence and Design (ISCID). Vol. 2. IEEE, 270--273. Google ScholarDigital Library
Show-Jane Yen and Yue-Shi Lee. 2006. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation. Springer, 731--740.Google Scholar
Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36, 3 (2009), 5718--5727. Google ScholarDigital Library
Yang Yong. 2012. The research of imbalanced data set of sample sampling method based on K-means cluster and genetic algorithm. Energy Procedia 17 (2012), 164--170.Google ScholarCross Ref
Kihoon Yoon and Stephen Kwek. 2005. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In Fifth International Conference on Hybrid Intelligent Systems, 2005. HIS'05. IEEE, 6--pp. Google ScholarDigital Library
Dai Yuanhong, Chen Hongchang, and Peng Tao. 2009. Cost-sensitive support vector machine based on weighted attribute. In International Forum on Information Technology and Applications, 2009. IFITA'09, Vol. 1. IEEE, 690--692. Google ScholarDigital Library
Bianca Zadrozny, John Langford, and Naoki Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining, 2003. ICDM 2003. IEEE, 435--442. Google ScholarDigital Library
Arnold Zellner. 1986. Bayesian estimation and prediction using asymmetric loss functions. J. Am. Statist. Assoc. 81, 394 (1986), 446--451.Google ScholarCross Ref
Dongmei Zhang, Wei Liu, Xiaosheng Gong, and Hui Jin. 2011. A novel improved SMOTE resampling algorithm based on fractal. J. Comput. Inform. Syst. 7, 6 (2011), 2204--2211.Google Scholar
Huaxiang Zhang and Mingfang Li. 2014. RWO-sampling: A random walk over-sampling approach to imbalanced data classification. Inform. Fus. 20 (2014), 99--116.Google ScholarCross Ref
Huimin Zhao, Atish P. Sinha, and Gaurav Bansal. 2011. An extended tuning method for cost-sensitive regression and forecasting. Dec. Support Syst. 51, 3 (2011), 372--383. Google ScholarDigital Library
Zhaohui Zheng, Xiaoyun Wu, and Rohini Srihari. 2004. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 80--89. Google ScholarDigital Library
Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 1 (2006), 63--77. Google ScholarDigital Library
Jingbo Zhu and Eduard H. Hovy. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In EMNLP-CoNLL, Vol. 7. 783--790.Google Scholar
Ling Zhuang and Honghua Dai. 2006a. Parameter estimation of one-class SVM on imbalance text classification. In Advances in Artificial Intelligence. Springer, 538--549. Google ScholarDigital Library
Ling Zhuang and Honghua Dai. 2006b. Parameter optimization of kernel-based one-class classifier on imbalance learning. J. Comput. 1, 7 (2006), 32--40.Google ScholarCross Ref

Index Terms

A Survey of Predictive Modeling on Imbalanced Domains
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Cost-sensitive learning

Recommendations

An Analysis of Performance Metrics for Imbalanced Classification
Discovery Science
Abstract
Numerous machine learning applications involve dealing with imbalanced domains, where the learning focus is on the least frequent classes. This imbalance introduces new challenges for both the performance assessment of these models and their ...
Read More
Over-sampling via under-sampling in strongly imbalanced data

Classification of imbalanced datasets is an important challenge in machine learning. This investigation analysed the effect of ratio imbalance and the selected classifier on the application of several re-sampling strategies to deal with imbalanced ...
Read More
An Effective Ensemble Method for Multi-class Classification and Regression for Imbalanced Data
Advances in Data Mining. Applications and Theoretical Aspects
Abstract
In the field of Data Mining, classification and regression plays a vital role as they are useful in various real-life domains. Most of the real-life data suffer from data imbalance problem. The performances of the standard algorithms are hindered ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 49, Issue 2
June 2017
747 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2966278
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
- Revised: 1 March 2016
- Accepted: 1 March 2016
- Received: 1 May 2015
Published in csur Volume 49, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Imbalanced domains
classification
performance metrics
rare cases
regression
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 683
  Total Citations
  View Citations
- 4,536
  Total Downloads
- Downloads (Last 12 months)489
- Downloads (Last 6 weeks)62
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Survey of Predictive Modeling on Imbalanced Domains

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

An Analysis of Performance Metrics for Imbalanced Classification

Over-sampling via under-sampling in strongly imbalanced data

An Effective Ensemble Method for Multi-class Classification and Regression for Imbalanced Data