skip to main content
survey

A Survey of Predictive Modeling on Imbalanced Domains

Authors Info & Claims
Published:13 August 2016Publication History
Skip Abstract Section

Abstract

Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

References

  1. Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz. 2004. Applying support vector machines to imbalanced datasets. In Machine Learning: ECML 2004. Springer, 39--50.Google ScholarGoogle Scholar
  2. Roberto Alejo, J. A. Antonio, Rosa Maria Valdovinos, and J. Horacio Pacheco-Sánchez. 2013. Assessments metrics for multi-class imbalance learning: A preliminary study. In Pattern Recognition. Springer, 335--343.Google ScholarGoogle Scholar
  3. Roberto Alejo, Vicente García, and J. Horacio Pacheco-Sánchez. 2014. An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neur. Process. Lett. (2014), 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Roberto Alejo, Vicente García, José Martínez Sotoca, Ramón Alberto Mollineda, and José Salvador Sánchez. 2007. Improving the performance of the RBF neural networks trained with imbalanced samples. In Computational and Ambient Intelligence. Springer, 162--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Roberto Alejo, Rosa Maria Valdovinos, Vicente García, and J. Horacio Pacheco-Sanchez. 2013. A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recogn. Lett. 34, 4 (2013), 380--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Roberto Alejo Eleuterio, José Martínez Sotoca, Vicente García Jiménez, and Rosa María Valdovinos Rosas. 2011. Back propagation with balanced MSE cost function and nearest neighbor editing for handling class overlap and class imbalance. (2011).Google ScholarGoogle Scholar
  7. Josh Attenberg and Seyda Ertekin. 2013. Class imbalance and active learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.Google ScholarGoogle Scholar
  8. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In Ninth International Conference on Intelligent Systems Design and Applications, 2009. ISDA'09. IEEE, 283--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gaurav Bansal, Atish P. Sinha, and Huimin Zhao. 2008. Tuning data mining methods for cost-sensitive regression: A study in loan charge-off forecasting. J. Manag. Inform. Syst. 25, 3 (2008), 315--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ricardo Barandela, José Salvador Sánchez, Vicente Garcia, and Edgar Rangel. 2003. Strategies for learning in class imbalance problems. Pattern Recogn. 36, 3 (2003), 849--851.Google ScholarGoogle ScholarCross RefCross Ref
  11. Vincent Barnab-Lortie, Colin Bellinger, and Nathalie Japkowicz. 2015. Active learning for one-class classification. In Proceedings of ICMLA'2015.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sukarna Barua, Monirul Islam, Xin Yao, and Kazuyuki Murase. 2012. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering (2012), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guilherme Batista, Danilo Silva, and Ronaldo Prati. 2012. An experimental design to evaluate class imbalance treatment methods. In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 95--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rukshan Batuwita and Vasile Palade. 2009. A new performance measure for class imbalance learning. Application to bioinformatics problems. In International Conference on Machine Learning and Applications, 2009. ICMLA'09. IEEE, 545--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rukshan Batuwita and Vasile Palade. 2010a. Efficient resampling methods for training support vector machines with imbalanced datasets. In The 2010 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  17. Rukshan Batuwita and Vasile Palade. 2010b. FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE Trans. Fuzzy Syst. 18, 3 (2010), 558--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Rukshan Batuwita and Vasile Palade. 2012. Adjusted geometric-mean: A novel performance measure for imbalanced bioinformatics datasets learning. J. Bioinform. Comput. Biol. 10, 4 (2012).Google ScholarGoogle ScholarCross RefCross Ref
  19. Colin Bellinger, Nathalie Japkowicz, and Christopher Drummond. 2015. Synthetic oversampling for advanced radioactive threat detection. In Proceedings ICML'2015.Google ScholarGoogle ScholarCross RefCross Ref
  20. Colin Bellinger, Shiven Sharma, and Nathalie Japkowicz. 2012. One-class versus binary classification: Which and when? In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 102--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jinbo Bi and Kristin P. Bennett. 2003. Regression error characteristic curves. In Proc. of the 20th Int. Conf. on Machine Learning. 43--50.Google ScholarGoogle Scholar
  22. Jerzy Błaszczyński and Jerzy Stefanowski. 2015. Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150 (2015), 529--542.Google ScholarGoogle ScholarCross RefCross Ref
  23. Andrew P. Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 7 (1997), 1145--1159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Paula Branco. 2014. Re-sampling Approaches for Regression Tasks under Imbalanced Domains. Master's thesis. Dept. Computer Science, Faculty of Sciences, University of Porto.Google ScholarGoogle Scholar
  25. Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and regression trees. Wadsworth & Brooks, Monterey, CA (1984).Google ScholarGoogle Scholar
  26. Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2009. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Advances in Knowledge Discovery and Data Mining. Springer, 475--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2011. MUTE: Majority under-sampling technique. In 2011 8th International Conference on Information, Communications and Signal Processing (ICICS). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  28. Chumphol Bunkhumpornpat, Krung Sinapiromsaran, and Chidchanok Lursinsap. 2012. DBSMOTE: Density-based synthetic minority over-sampling technique. Applied Intelligence 36, 3 (2012), 664--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chumphol Bunkhumpornpat and Sitthichoke Subpaiboonkit. 2013. Safe level graph for synthetic minority over-sampling techniques. In 2013 13th International Symposium on Communications and Information Technologies (ISCIT). IEEE, 570--575.Google ScholarGoogle ScholarCross RefCross Ref
  30. Michael Cain and Christian Janssen. 1995. Real estate price prediction under asymmetric loss. Ann. Inst. Stat. Math. 47, 3 (1995), 401--414.Google ScholarGoogle Scholar
  31. Peng Cao, Dazhe Zhao, and Osmar R. Zaïane. 2013. A PSO-based cost-sensitive neural network for imbalanced data classification. In Trends and Applications in Knowledge Discovery and Data Mining. Springer, 452--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Cristiano Leite Castro and Antônio de Pádua Braga. 2013. Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans. Neur. Netw. Learn. Syst. 24, 6 (2013), 888--899.Google ScholarGoogle ScholarCross RefCross Ref
  33. Edward Y. Chang, Beitao Li, Gang Wu, and Kingshy Goh. 2003. Statistical learning for effective visual information retrieval. In ICIP (3). 609--612.Google ScholarGoogle Scholar
  34. Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015a. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing 163 (2015), 3--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Francisco Charte, Antonio J. Rivera, María J. del Jesus, and Francisco Herrera. 2015b. MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89 (2015), 385--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. JAIR 16 (2002), 321--357. Google ScholarGoogle ScholarCross RefCross Ref
  37. Nitesh V. Chawla, David A. Cieslak, Lawrence O. Hall, and Ajay Joshi. 2008. Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 17, 2 (2008), 225--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nitesh V. Chawla, Lawrence O. Hall, and Ajay Joshi. 2005. Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM, New York, NY, 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nitesh V. Chawla, Nathalie Japkowicz, and Aleksander Kotcz. 2004. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Knowledge Discovery in Databases: PKDD 2003. Springer, 107--119.Google ScholarGoogle Scholar
  41. Chao Chen, Andy Liaw, and Leo Breiman. 2004. Using random forest to learn imbalanced data. University of California, Berkeley (2004).Google ScholarGoogle Scholar
  42. Sheng Chen, Haibo He, and Edwardo A. Garcia. 2010. Ramoboost: Ranked minority oversampling in boosting. IEEE Trans. Neural Networks 21, 10 (2010), 1624--1642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xue-wen Chen and Michael Wasikowski. 2008. Fast: A roc-based feature selection metric for small samples and imbalanced data classification problems. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 124--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Peter F. Christoffersen and Francis X. Diebold. 1996. Further results on forecasting and model selection under asymmetric loss. J. Appl. Econom. 11, 5 (1996), 561--571.Google ScholarGoogle ScholarCross RefCross Ref
  45. Peter F. Christoffersen and Francis X. Diebold. 1997. Optimal prediction under asymmetric loss. Econom. Theor. 13, 6 (1997), 808--817.Google ScholarGoogle ScholarCross RefCross Ref
  46. Leilei Chu, Hui Gao, and Wenbo Chang. 2010. A new feature weighting method based on probability distribution in imbalanced text classification. In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Vol. 5. IEEE, 2335--2339.Google ScholarGoogle ScholarCross RefCross Ref
  47. Yu-Meei Chyi. 2003. Classification analysis techniques for skewed class distribution problems. Master Thesis, Department of Information Management, National Sun Yat-Sen University (2003).Google ScholarGoogle Scholar
  48. David A. Cieslak and Nitesh V. Chawla. 2008. Learning decision trees for unbalanced data. In Machine Learning and Knowledge Discovery in Databases. Springer, 241--256.Google ScholarGoogle Scholar
  49. David A. Cieslak, Thomas R. Hoens, Nitesh V. Chawla, and W. Philip Kegelmeyer. 2012. Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 24, 1 (2012), 136--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Gilles Cohen, Mélanie Hilario, Hugo Sax, Stéphane Hugonnet, and Antoine Geissbuhler. 2006. Learning from imbalanced data in surveillance of nosocomial infection. Artif. Intell. Med. 37, 1 (2006), 7--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sven F. Crone, Stefan Lessmann, and Robert Stahlbock. 2005. Utility based data mining for time series analysis: Cost-sensitive learning for neural network predictors. In Proceedings of the 1st International Workshop on Utility-based Data Mining. ACM, New York, NY, 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Andrea Dal Pozzolo, Olivier Caelen, and Gianluca Bontempi. 2015. When is undersampling effective in unbalanced classification tasks? In Machine Learning and Knowledge Discovery in Databases. Springer, 200--215.Google ScholarGoogle Scholar
  53. Sophia Daskalaki, Ioannis Kopanas, and Nikolaos M. Avouris. 2006. Evaluation of classifiers for an uneven class distribution problem. Appl. Artif. Intell. 20, 5 (2006), 381--417.Google ScholarGoogle ScholarCross RefCross Ref
  54. Jesse Davis and Mark Goadrich. 2006. The relationship between Precision-Recall and ROC curves. In ICML'06: Proc. of the 23rd Int. Conf. on Machine Learning (ACM ICPS). ACM, New York, NY, 233--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. María Dolores Del Castillo and José Ignacio Serrano. 2004. A multistrategy approach for digital text categorization from imbalanced documents. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 70--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Misha Denil and Thomas Trappenberg. 2010. Overlap versus imbalance. In Advances in Artificial Intelligence. Springer, 220--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Pedro Domingos. 1999. MetaCost: A general method for making classifiers cost-sensitive. In KDD'99: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. John Doucette and Malcolm I. Heywood. 2008. GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In Genetic Programming. Springer, 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Dennis J. Drown, Taghi M. Khoshgoftaar, and Naeem Seliya. 2009. Evolutionary sampling and software quality modeling of high-assurance systems. IEEE Trans. Syst. Man Cybernet. A 39, 5 (2009), 1097--1107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Chris Drummond and Robert C. Holte. 2000. Explicitly representing expected cost: An alternative to ROC representation. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 198--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Chris Drummond and Robert C. Holte. 2003. C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Workshop on Learning from Imbalanced Datasets II, Vol. 11. Citeseer.Google ScholarGoogle Scholar
  62. James P. Egan. 1975. Signal detection theory and {ROC} analysis. (1975).Google ScholarGoogle Scholar
  63. Charles Elkan. 2001. The foundations of cost-sensitive learning. In IJCAI'01: Proc. of 17th Int. Joint Conf. of Artificial Intelligence, Vol. 1. Morgan Kaufmann Publishers, 973--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Şeyda Ertekin. 2013. Adaptive oversampling for imbalanced data classification. In Information Sciences and Systems 2013. Springer, 261--269.Google ScholarGoogle ScholarCross RefCross Ref
  65. Şeyda Ertekin, Jian Huang, Leon Bottou, and Lee Giles. 2007b. Learning on the border: Active learning in imbalanced data classification. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. ACM, New York, NY, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Şeyda Ertekin, Jian Huang, and C. Lee Giles. 2007a. Active learning for class imbalance problem. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 823--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Andrew Estabrooks and Nathalie Japkowicz. 2001. A mixture-of-experts framework for learning from imbalanced data sets. In Advances in Intelligent Data Analysis. Springer, 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Andrew Estabrooks, Taeho Jo, and Nathalie Japkowicz. 2004. A multiple resampling method for learning from imbalanced data sets. Comput. Intell. 20, 1 (2004), 18--36.Google ScholarGoogle ScholarCross RefCross Ref
  69. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recogn. Lett. 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Alberto Fernández, María José del Jesus, and Francisco Herrera. 2010. On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inform. Sci. 180, 8 (2010), 1268--1291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Alberto Fernández, Salvador García, María José del Jesus, and Francisco Herrera. 2008. A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159, 18 (2008), 2378--2398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Antonio Fernández-Baldera, José M. Buenaposada, and Luis Baumela. 2015. Multi-class boosting for imbalanced data. In Pattern Recognition and Image Analysis. Springer, 57--64.Google ScholarGoogle Scholar
  73. César Ferri, Peter Flach, José Hernández-Orallo, and Athmane Senad. 2005. Modifying ROC curves to incorporate predicted probabilities. In Proceedings of the Second Workshop on ROC Analysis in Machine Learning. 33--40.Google ScholarGoogle Scholar
  74. César Ferri, José Hernández-orallo, and Peter A. Flach. 2011a. Brier curves: A new cost-based visualisation of classifier performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 585--592.Google ScholarGoogle Scholar
  75. César Ferri, José Hernández-Orallo, and Peter A. Flach. 2011b. A coherent interpretation of AUC as a measure of aggregated classification performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 657--664.Google ScholarGoogle Scholar
  76. César Ferri, José Hernández-Orallo, and R. Modroiu. 2009. An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 1 (2009), 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. George Forman. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (2003), 1289--1305. Google ScholarGoogle ScholarCross RefCross Ref
  78. George Forman and Ira Cohen. 2004. Learning from little: Comparison of classifiers given little training. In Knowledge Discovery in Databases: PKDD 2004. Springer, 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Mikel Galar, Alberto Fernández, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera. 2012. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybernet. C 42, 4 (2012), 463--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Mikel Galar, Alberto Fernández, Edurne Barrenechea, and Francisco Herrera. 2013. Eusboost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Ming Gao, Xia Hong, Sheng Chen, Chris J. Harris, and Emad Khalaf. 2014. PDFOS: PDF estimation based over-sampling for imbalanced two-class problems. Neurocomputing 138 (2014), 248--259.Google ScholarGoogle ScholarCross RefCross Ref
  82. Joaquín García, Salvador Derrac, Isaac Triguero, Cristobal J. Carmona, and Francisco Herrera. 2012. Evolutionary-based selection of generalized instances for imbalanced classification. Knowl.-Based Syst. 25, 1 (2012), 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Salvador García, José Ramón Cano, Alberto Fernández, and Francisco Herrera. 2006. A proposal of evolutionary prototype selection for class imbalance problems. In Intelligent Data Engineering and Automated Learning--IDEAL 2006. Springer, 1415--1423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Salvador García and Francisco Herrera. 2009. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17, 3 (2009), 275--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Vicente García, Roberto Alejo, José Salvador Sánchez, José Martínez Sotoca, and Ramón Alberto Mollineda. 2006. Combined effects of class imbalance and class overlap on instance-based classification. In Intelligent Data Engineering and Automated Learning--IDEAL 2006. Springer, 371--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2008. A new performance evaluation method for two-class imbalanced problems. In Structural, Syntactic, and Statistical Pattern Recognition. Springer, 917--925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2009. Index of balanced accuracy: A performance measure for skewed class distributions. In Pattern Recognition and Image Analysis. Springer, 441--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Vicente García, Ramón Alberto Mollineda, and José Salvador Sánchez. 2010. Theoretical analysis of a performance measure for imbalanced data. In 2010 20th International Conference on Pattern Recognition (ICPR). IEEE, 617--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Alireza Ghasemi, Mohammad T. Manzuri, Hamid R. Rabiee, Mohammad H. Rohban, and Siavash Haghiri. 2011a. Active one-class learning by kernel density estimation. In 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  90. Alireza Ghasemi, Hamid R. Rabiee, Mohsen Fadaee, Mohammad T. Manzuri, and Mohammad H. Rohban. 2011b. Active learning from positive and unlabeled data. In 2011 IEEE 11th International Conference on Data Mining Workshops (ICDMW). IEEE, 244--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Adel Ghazikhani, Reza Monsefi, and Hadi Sadoghi Yazdi. 2014. Online neural network model for non-stationary and imbalanced data stream classification. Int. J. Mach. Learn. Cybernet. 5, 1 (2014), 51--62.Google ScholarGoogle ScholarCross RefCross Ref
  92. Clive W. Granger. 1999. Outline of forecast theory using generalized cost functions. Span. Econ. Rev. 1, 2 (1999), 161--173.Google ScholarGoogle ScholarCross RefCross Ref
  93. Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Advances in Intelligent Computing. Springer, 878--887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. David J. Hand. 2009. Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learn. 77, 1 (2009), 103--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Peter. E. Hart. 1968. The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14 (1968), 515--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, 1322--1328.Google ScholarGoogle Scholar
  97. Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Knowl. Data Eng. 21, 9 (2009), 1263--1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Haibo He and Yunqian Ma. 2013. Imbalanced Learning: Foundations, Algorithms, and Applications. John Wiley & Sons. Google ScholarGoogle ScholarCross RefCross Ref
  99. José Hernández-Orallo. 2012. Soft (gaussian CDE) regression models and loss functions. arXiv Preprint arXiv:1211.1043 (2012).Google ScholarGoogle Scholar
  100. José Hernández-Orallo. 2013. {ROC} curves for regression. Pattern Recogn. 46, 12 (2013), 3395--3411. DOI:http://dx.doi.org/10.1016/j.patcog.2013.06.014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. José Hernández-Orallo. 2014. Probabilistic reframing for cost-sensitive regression. ACM Trans. Knowl. Discov. Data 8, 4, Article 17 (Aug. 2014), 55 pages. DOI:http://dx.doi.org/10.1145/2641758 Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. José Hernández-Orallo, Peter Flach, and César Ferri. 2012. A unified view of performance metrics: Translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 1 (2012), 2813--2869. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Robert C. Holte, Liane E. Acker, and Bruce W. Porter. 1989. Concept learning and the problem of small disjuncts. In IJCAI, Vol. 89. Citeseer, 813--818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Junjie Hu. 2012. Active learning for imbalance problem using L-GEM of RBFNN. In ICMLC. 490--495.Google ScholarGoogle Scholar
  105. Shengguo Hu, Yanfeng Liang, Lintao Ma, and Ying He. 2009. MSMOTE: Improving classification performance when training data is imbalanced. In Second International Workshop on Computer Science and Engineering, 2009. WCSE'09, Vol. 2. IEEE, 13--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Kaizhu Huang, Haiqin Yang, Irwin King, and Michael R. Lyu. 2004. Learning classifiers from imbalanced data based on biased minimax probability machine. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. Vol. 2. IEEE, II--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Jae Pil Hwang, Seongkeun Park, and Euntai Kim. 2011. A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Syst. Appl. 38, 7 (2011), 8580--8585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Tasadduq Imam, Kai Ming Ting, and Joarder Kamruzzaman. 2006. z-SVM: An SVM for improved classification of imbalanced data. In AI 2006: Advances in Artificial Intelligence. Springer, 264--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Nathalie Japkowicz. 2000. Learning from imbalanced data sets: A comparison of various strategies. In AAAI Workshop on Learning from Imbalanced Data Sets, Vol. 68. Menlo Park, CA.Google ScholarGoogle Scholar
  110. Nathalie Japkowicz. 2001. Concept-learning in the presence of between-class and within-class imbalances. In Advances in Artificial Intelligence. Springer, 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Nathalie Japkowicz. 2003. Class imbalances: Are we focusing on the right issue. In Workshop on Learning from Imbalanced Data Sets II, Vol. 1723. 63.Google ScholarGoogle Scholar
  112. Natalie Japkowicz. 2013. Assessment metrics for imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.Google ScholarGoogle Scholar
  113. Nathalie Japkowicz, Catherine Myers, and Mark Gluck. 1995. A novelty detection approach to classification. In IJCAI. 518--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Nathalie Japkowicz and Mohak Shah. 2011. Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press. Google ScholarGoogle Scholar
  115. Nathalie Japkowicz and Shaju Stephen. 2002. The class imbalance problem: A systematic study. Intell. Data Anal. 6, 5 (2002), 429--449. Google ScholarGoogle ScholarCross RefCross Ref
  116. Piyasak Jeatrakul, Kok Wai Wong, and Chun Che Fung. 2010. Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm. In Neural Information Processing. Models and Applications. Springer, 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Taeho Jo and Nathalie Japkowicz. 2004. Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 40--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Mahesh V. Joshi, Vipin Kumar, and Ramesh C. Agarwal. 2001. Evaluating boosting algorithms to classify rare classes: Comparison and improvements. In Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001. IEEE, 257--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Pilsung Kang and Sungzoon Cho. 2006. EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. In Neural Information Processing. Springer, 837--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Taghi M. Khoshgoftaar, Chris Seiffert, Jason Van Hulse, Amri Napolitano, and Andres Folleco. 2007. Learning with limited minority class data. In Sixth International Conference on Machine Learning and Applications, 2007. ICMLA 2007. IEEE, 348--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Sotiris Kotsiantis, Dimitris Kanellopoulos, and Panayiotis Pintelas. 2006. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng. 30, 1 (2006), 25--36.Google ScholarGoogle Scholar
  122. Sotiris Kotsiantis and Panagiotis Pintelas. 2003. Mixture of expert agents for handling imbalanced data sets. Ann. Math. Comput. Teleinform. 1, 1 (2003), 46--55.Google ScholarGoogle Scholar
  123. Miroslav Kubat, Robert C. Holte, and Stan Matwin. 1998. Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30, 2--3 (1998), 195--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Miroslav Kubat and Stan Matwin. 1997. Addressing the curse of imbalanced training sets: One-sided selection. In Proc. of the 14th Int. Conf. on Machine Learning. Morgan Kaufmann, 179--186.Google ScholarGoogle Scholar
  125. Jorma Laurikkala. 2001. Improving Identification of Difficult Small Classes by Balancing Class Distribution. Springer.Google ScholarGoogle Scholar
  126. Hyoung-joo Lee and Sungzoon Cho. 2006. The novelty detection approach for different degrees of class imbalance. In Neural Information Processing. Springer, 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Sauchi Stephen Lee. 1999. Regularization in skewed binary classification. Comput. Stat. 14, 2 (1999), 277.Google ScholarGoogle ScholarCross RefCross Ref
  128. Sauchi Stephen Lee. 2000. Noisy replication in skewed binary classification. Comput. Stat. Data Anal. 34, 2 (2000), 165--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Tae-Hwy Lee. 2008. Loss functions in time series forecasting. International Encyclopedia of the Social Sciences (2008).Google ScholarGoogle Scholar
  130. Chen Li, Chen Jing, and Gao Xin-tao. 2009. An improved P-SVM method used to deal with imbalanced data sets. In IEEE International Conference on Intelligent Computing and Intelligent Systems, 2009. ICIS 2009, Vol. 1. IEEE, 118--122.Google ScholarGoogle Scholar
  131. Kewen Li, Wenrong Zhang, Qinghua Lu, and Xianghua Fang. 2014. An improved SMOTE imbalanced data classification method based on support degree. In 2014 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI). IEEE, 34--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Peng Li, Pei-Li Qiao, and Yuan-Chao Liu. 2008. A hybrid re-sampling method for SVM learning from imbalanced data sets. In Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008. FSKD'08. Vol. 2. IEEE, 65--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. M. Lichman. 2013. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  134. Chun-Fu Lin and Sheng-De Wang. 2002. Fuzzy support vector machines. IEEE Trans. Neur. Network. 13, 2 (2002), 464--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Alexander Liu, Joydeep Ghosh, and Cheryl E. Martin. 2007. Generative oversampling for mining imbalanced datasets. In DMIN. 66--72.Google ScholarGoogle Scholar
  136. Wei Liu, Sanjay Chawla, David A. Cieslak, and Nitesh V. Chawla. 2010. A robust decision tree algorithm for imbalanced data sets. In SDM, Vol. 10. SIAM, 766--777.Google ScholarGoogle Scholar
  137. Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2009. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybernet. B 39, 2 (2009), 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Yang Liu, Aijun An, and Xiangji Huang. 2006. Boosting prediction accuracy on imbalanced datasets with SVM ensembles. In Advances in Knowledge Discovery and Data Mining. Springer, 107--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Victoria López, Alberto Fernández, Salvador García, Vasile Palade, and Francisco Herrera. 2013. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inform. Sci. 250 (2013), 113--141.Google ScholarGoogle ScholarCross RefCross Ref
  140. Victoria López, Alberto Fernández, and Francisco Herrera. 2014. On the importance of the validation technique for classification with imbalanced datasets: Addressing covariate shift when data is skewed. Inform. Sci. 257 (2014), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. José María Luna, Cristóbal Romero, José Raúl Romero, and Sebastián Ventura. 2015. An evolutionary algorithm for the discovery of rare class association rules in learning management systems. Appl. Intell. 42, 3 (2015), 501--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Tomasz Maciejewski and Jerzy Stefanowski. 2011. Local neighbourhood extension of SMOTE for mining imbalanced data. In 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE, 104--111.Google ScholarGoogle ScholarCross RefCross Ref
  143. Satyam Maheshwari, Jitendra Agrawal, and Sanjeev Sharma. 2011. A new approach for classification of highly imbalanced datasets using evolutionary algorithms. Intl. J. Sci. Eng. Res 2 (2011), 1--5.Google ScholarGoogle Scholar
  144. Marcus A. Maloof. 2003. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 Workshop on Learning from Imbalanced Data Sets II, Vol. 2. 2--1.Google ScholarGoogle Scholar
  145. Larry Manevitz and Malik Yousef. 2002. One-class SVMs for document classification. J. Mach. Learn. Res. 2 (2002), 139--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Olvi L. Mangasarian and Edward W. Wild. 2001. Proximal support vector machine classifiers. In Proceedings KDD-2001: Knowledge Discovery and Data Mining. Citeseer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Veenu Mangat and Renu Vig. 2014. Intelligent rule mining algorithm for classification over imbalanced data. J. Emerg. Technol. Web Intell. 6, 3 (2014), 373--379.Google ScholarGoogle Scholar
  148. Inderjeet Mani and Jianping Zhang. 2003. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets.Google ScholarGoogle Scholar
  149. José Manuel Martínez-García, Carmen Paz Suárez-Araujo, and Patricio García Báez. 2012. SNEOM: A sanger network based extended over-sampling method. application to imbalanced biomedical datasets. In Neural Information Processing. Springer, 584--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. David Mease, Abraham Wyner, and Andreas Buja. 2007. Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Mach. Learn. Res. 8 (2007), 409--439.Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Giovanna Menardi and Nicola Torelli. 2010. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. (2010), 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Charles E. Metz. 1978. Basic principles of ROC analysis. In Seminars in Nuclear Medicine, Vol. 8. Elsevier, 283--298.Google ScholarGoogle Scholar
  153. Ying Mi. 2013. Imbalanced classification based on active learning SMOTE. Res. J. Appl. Sci. 5 (2013).Google ScholarGoogle Scholar
  154. Dunja Mladenic and Marko Grobelnik. 1999. Feature selection for unbalanced class distribution and naive bayes. In ICML, Vol. 99. 258--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Jose G. Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recogn. 45, 1 (2012), 521--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Douglas Mossman. 1999. Three-way rocs. Med. Dec. Mak. 19, 1 (1999), 78--89.Google ScholarGoogle Scholar
  157. Satuluri Naganjaneyulu and Mrithyumjaya Rao Kuppa. 2013. A novel framework for class imbalance learning using intelligent under-sampling. Progr. Artif. Intell. 2, 1 (2013), 73--84.Google ScholarGoogle ScholarCross RefCross Ref
  158. Munehiro Nakamura, Yusuke Kajiwara, Atsushi Otsuka, and Haruhiko Kimura. 2013. LVQ-SMOTE--learning vector quantization based synthetic minority over--sampling technique for biomedical data. BioData Min. 6, 1 (2013), 16.Google ScholarGoogle ScholarCross RefCross Ref
  159. Krystyna Napierała, Jerzy Stefanowski, and Szymon Wilk. 2010. Learning from imbalanced data in presence of noisy and borderline examples. In Rough Sets and Current Trends in Computing. Springer, 158--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Wing WY Ng, Jiankun Hu, Daniel S. Yeung, Sha Yin, and Fabio Roli. 2014. Diversified sensitivity-based undersampling for imbalance classification problems. (2014).Google ScholarGoogle Scholar
  161. Sang-Hoon Oh. 2011. Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74, 6 (2011), 1058--1061. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Ronald Pearson, Gregory Goney, and James Shwaber. 2003. Imbalanced clustering for microarray time-series. In Proceedings of the ICML, Vol. 3.Google ScholarGoogle Scholar
  163. María Pérez-Ortiz, Pedro Antonio Gutiérrez, and César Hervás-Martínez. 2014. Projection-based ensemble learning for ordinal regression. IEEE Trans. Cybernet. 44, 5 (2014), 681--694.Google ScholarGoogle ScholarCross RefCross Ref
  164. Clifton Phua, Damminda Alahakoon, and Vincent Lee. 2004. Minority report in fraud detection: Classification of skewed data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 50--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria Carolina Monard. 2004a. Class imbalances versus class overlapping: An analysis of a learning system behavior. In MICAI 2004: Advances in Artificial Intelligence. Springer, 312--321.Google ScholarGoogle ScholarCross RefCross Ref
  166. Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Maria Carolina Monard. 2004b. Learning with class skews and small disjuncts. In Advances in Artificial Intelligence--SBIA 2004. Springer, 296--306.Google ScholarGoogle ScholarCross RefCross Ref
  167. Ronaldo C. Prati, Gustavo E. A. P. A. Batista, and Diego F. Silva. 2014. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inform. Syst. (2014), 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Foster J. Provost and Tom Fawcett. 1997. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In KDD, Vol. 97. 43--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Foster J Provost, Tom Fawcett, and Ron Kohavi. 1998. The case against accuracy estimation for comparing induction algorithms. In ICML'98: Proc. of the 15th Int. Conf. on Machine Learning. Morgan Kaufmann Publishers, 445--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Troy Raeder, George Forman, and Nitesh V. Chawla. 2012. Learning from imbalanced data: Evaluation matters. In Data Mining: Foundations and Intelligent Paradigms. Springer, 315--331.Google ScholarGoogle Scholar
  171. Enislay Ramentol, Yailé Caballero, Rafael Bello, and Francisco Herrera. 2012a. SMOTE-RSB*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inform. Syst. 33, 2 (2012), 245--265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Enislay Ramentol, Nelle Verbiest, Rafael Bello, Yailé Caballero, Chris Cornelis, and Francisco Herrera. 2012b. SMOTE-FRST: A new resampling method using fuzzy rough set theory. In 10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering and Decision Making (to Appear).Google ScholarGoogle ScholarCross RefCross Ref
  173. Romesh Ranawana and Vasile Palade. 2006. Optimized precision-a new measure for classifier performance evaluation. In IEEE Congress on Evolutionary Computation, 2006. CEC 2006. IEEE, 2254--2261.Google ScholarGoogle ScholarCross RefCross Ref
  174. Bhavani Raskutti and Adam Kowalczyk. 2004. Extreme re-balancing for SVMs: A case study. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 60--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. Rita P. Ribeiro. 2011. Utility-based Regression. Ph.D. Dissertation. Dep. Computer Science, Faculty of Sciences, University of Porto.Google ScholarGoogle Scholar
  176. Rita P. Ribeiro and Luís Torgo. 2003. Predicting harmful algae blooms. In Progress in Artificial Intelligence. Springer, 308--312.Google ScholarGoogle Scholar
  177. Cornelis V. Rijsbergen. 1979. Information Retrieval. Dept. of Computer Science, University of Glasgow, 2nd edition. (1979). Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Juan J. Rodríguez, José-Francisco Díez-Pastor, Jesús Maudes, and César García-Osorio. 2012. Disturbing neighbors ensembles of trees for imbalanced data. In 2012 11th International Conference on Machine Learning and Applications (ICMLA), Vol. 2. IEEE, 83--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. José A. Sáez, Julián Luengo, Jerzy Stefanowski, and Francisco Herrera. 2015. SMOTE--IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inform. Sci. 291 (2015), 184--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Juan Pablo Sánchez-Crisostomo, Roberto Alejo, Erika López-González, Rosa María Valdovinos, and J. Horacio Pacheco-Sánchez. 2014. Empirical analysis of assessments metrics for multi-class imbalance learning on the back-propagation context. In Advances in Swarm Intelligence. Springer, 17--23.Google ScholarGoogle Scholar
  181. Javier Sánchez-Monedero, Pedro Antonio Gutiérrez, and Cesar Hervás-Martínez. 2013. Evolutionary ordinal extreme learning machine. In Hybrid Artificial Intelligent Systems. Springer, 500--509.Google ScholarGoogle Scholar
  182. Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neur. Comput. 13, 7 (2001), 1443--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Andres Folleco. 2011. An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inform. Sci. (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. 2010. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans.Syst. Man Cybernet. A 40, 1 (2010), 185--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. Shiven Sharma, Colin Bellinger, and Nathalie Japkowicz. 2012. Clustering based one-class classification for compliance verification of the comprehensive nuclear-test-ban treaty. In Advances in Artificial Intelligence. Springer, 181--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  186. Atish P. Sinha and Jerrold H. May. 2004. Evaluating and tuning predictive data mining models using receiver operating characteristic curves. J. Manag. Inform. Syst. 21, 3 (2004), 249--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  187. Parinaz Sobhani, Herna Viktor, and Stan Matwin. 2014. Learning from imbalanced data using ensemble methods and cluster-based undersampling. In New Frontiers in Mining Complex Patterns. Springer, 69--83.Google ScholarGoogle Scholar
  188. Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inform. Process. Manag. 45, 4 (2009), 427--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. Jie Song, Xiaoling Lu, and Xizhi Wu. 2009. An improved AdaBoost algorithm for unbalanced classification data. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009. FSKD'09. Vol. 1. IEEE, 109--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. Panote Songwattanasiri and Krung Sinapiromsaran. 2010. SMOUTE: Synthetics minority over-sampling and under-sampling techniques for class imbalanced problem. In Proceedings of the Annual International Conference on Computer Science Education: Innovation and Technology, Special Track: Knowledge Discovery. 78--83.Google ScholarGoogle Scholar
  191. Jerzy Stefanowski. 2016. Dealing with data difficulty factors while learning from imbalanced data. In Challenges in Computational Statistics and Data Mining. Springer, 333--363.Google ScholarGoogle Scholar
  192. Jerzy Stefanowski and Szymon Wilk. 2008. Selective pre-processing of imbalanced data for improving classification performance. In Data Warehousing and Knowledge Discovery. Springer, 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. Yanmin Sun, Mohamed S. Kamel, and Yang Wang. 2006. Boosting for learning multiple classes with imbalanced class distribution. In Sixth International Conference on Data Mining, 2006. ICDM'06. IEEE, 592--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  194. Yanmin Sun, Mohamed S. Kamel, Andrew K. C. Wong, and Yang Wang. 2007. Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40, 12 (2007), 3358--3378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  195. Yanmin Sun, Andrew K. C. Wong, and Mohamed S. Kamel. 2009. Classification of imbalanced data: A review. Int. J. Pattern Recogn. Artif. Intell. 23, 4 (2009), 687--719.Google ScholarGoogle ScholarCross RefCross Ref
  196. Muhammad Atif Tahir, Josef Kittler, and Fei Yan. 2012. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recogn. 45, 10 (2012), 3738--3750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  197. Aik Tan, David Gilbert, and Yves Deville. 2003. Multi-class protein fold classification using a new ensemble machine learning approach. (2003).Google ScholarGoogle Scholar
  198. Yuchun Tang and Yan-Qing Zhang. 2006. Granular SVM with repetitive undersampling for highly imbalanced protein homology prediction. In 2006 IEEE International Conference on Granular Computing. IEEE, 457--460.Google ScholarGoogle ScholarCross RefCross Ref
  199. Yuchun Tang, Yan-Qing Zhang, Nitesh V. Chawla, and Sven Krasser. 2009. SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybernet. B 39, 1 (2009), 281--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  200. Dacheng Tao, Xiaoou Tang, Xuelong Li, and Xindong Wu. 2006. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7 (2006), 1088--1099. Google ScholarGoogle ScholarDigital LibraryDigital Library
  201. Nguyen Thai-Nghe, Zeno Gantner, and Lars Schmidt-Thieme. 2011. A new evaluation measure for learning from imbalanced data. In The 2011 International Joint Conference on Neural Networks (IJCNN). IEEE, 537--542.Google ScholarGoogle ScholarCross RefCross Ref
  202. Ivan Tomek. 1976. Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 11 (1976), 769--772.Google ScholarGoogle Scholar
  203. Luís Torgo. 2005. Regression error characteristic surfaces. In KDD'05: Proc. of the 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, 697--702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  204. Luís Torgo and Rita P. Ribeiro. 2003. Predicting outliers. In Knowledge Discovery in Databases: PKDD 2003. Springer, 447--458.Google ScholarGoogle Scholar
  205. Luís Torgo and Rita P. Ribeiro. 2007. Utility-based regression. In PKDD'07: Proc. of 11th European Conf. on Principles and Practice of Knowledge Discovery in Databases. Springer, 597--604.Google ScholarGoogle Scholar
  206. Luís Torgo and Rita P. Ribeiro. 2009. Precision and recall in regression. In DS'09: 12th Int. Conf. on Discovery Science. Springer, 332--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  207. Luís Torgo, Rita P. Ribeiro, Bernhard Pfahringer, and Paula Branco. 2013. SMOTE for regression. In Progress in Artificial Intelligence. Springer, 378--389.Google ScholarGoogle Scholar
  208. Peter Van Der Putten and Maarten Van Someren. 2004. A bias-variance analysis of a real-world learning problem: The coil challenge 2000. Mach. Learn. 57, 1--2 (2004), 177--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  209. Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napolitano. 2007. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th International Conference on Machine Learning. ACM, 935--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  210. Madireddi Vasu and Vadlamani Ravi. 2011. A hybrid under-sampling approach for mining unbalanced datasets: Applications to banking and insurance. Int. J. Data Min. Model. Manag. 3, 1 (2011), 75--105.Google ScholarGoogle Scholar
  211. Nele Verbiest, Enislay Ramentol, Chris Cornelis, and Francisco Herrera. 2012. Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In Advances in Artificial Intelligence--IBERAMIA 2012. Springer, 169--178.Google ScholarGoogle ScholarCross RefCross Ref
  212. Konstantinos Veropoulos, Colin Campbell, and Nello Cristianini. 1999. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 1999. Citeseer, 55--60.Google ScholarGoogle Scholar
  213. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11 (2010), 3371--3408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  214. Kiri L. Wagstaff, Nina L. Lanza, David R. Thompson, Thomas G. Dietterich, and Martha S. Gilmore. 2013. Guiding scientific discovery with explanations using DEMUD. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  215. Byron C. Wallace and Issa J. Dahabreh. 2012. Class probability estimates are unreliable for imbalanced data (and how to fix them). In 2012 IEEE 12th International Conference on Data Mining (ICDM). IEEE, 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  216. Byron C. Wallace and Issa J. Dahabreh. 2014. Improving class probability estimates for imbalanced data. Knowl. Inform. Syst. 41, 1 (2014), 33--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  217. Byron C. Wallace, Kevin Small, Carla E. Brodley, and Thomas A. Trikalinos. 2011. Class imbalance, redux. In 2011 IEEE 11th International Conference on Data Mining (ICDM). IEEE, 754--763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  218. Benjamin X. Wang and Nathalie Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowl. Inform. Syst. 25, 1 (2010), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  219. Heng Wang and Zubin Abraham. 2015. Concept drift detection for imbalanced stream data. arXiv Preprint arXiv:1504.01044 (2015).Google ScholarGoogle Scholar
  220. He-Yong Wang. 2008. Combination approach of SMOTE and biased-SVM for imbalanced datasets. In IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE, 228--231.Google ScholarGoogle ScholarCross RefCross Ref
  221. Shuo Wang and Xin Yao. 2009. Diversity analysis on imbalanced data sets by using ensemble models. In IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM'09. IEEE, 324--331.Google ScholarGoogle ScholarCross RefCross Ref
  222. Xiaoguang Wang, Xuan Liu, Nathalie Japkowicz, and Stan Matwin. 2013a. Resampling and cost-sensitive methods for imbalanced multi-instance learning. In 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW). IEEE, 808--816.Google ScholarGoogle ScholarCross RefCross Ref
  223. Xiaoguang Wang, Stan Matwin, Nathalie Japkowicz, and Xuan Liu. 2013b. Cost-sensitive boosting algorithms for imbalanced multi-instance datasets. In Advances in Artificial Intelligence. Springer, 174--186.Google ScholarGoogle Scholar
  224. Mike Wasikowski and Xue-wen Chen. 2010. Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng. 22, 10 (2010), 1388--1400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  225. Deng Weiguo, Wang Li, Wang Yiyang, and Qian Zhong. 2012. An improved SVM-KM model for imbalanced datasets. In 2012 International Conference on Industrial Control and Electronics Engineering (ICICEE). IEEE, 100--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  226. Gary M. Weiss. 2004. Mining with rarity: A unifying framework. SIGKDD Explor. Newslett. 6, 1 (2004), 7--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  227. Gary M. Weiss. 2005. Mining with rare cases. In Data Mining and Knowledge Discovery Handbook. Springer, 765--776.Google ScholarGoogle Scholar
  228. Gary M. Weiss. 2010. The impact of small disjuncts on classifier learning. In Data Mining. Springer, 193--226.Google ScholarGoogle Scholar
  229. Gary M. Weiss. 2013. Foundations of imbalanced learning. In Imbalanced Learning: Foundations, Algorithms, and Applications, Haibo He and Yunqian Ma (Eds.). John Wiley & Sons.Google ScholarGoogle Scholar
  230. Gary M. Weiss and Foster J. Provost. 2003. Learning when training data are costly: The effect of class distribution on tree induction. J. Artif. Intell. Res.(JAIR) 19 (2003), 315--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  231. Cheng G. Weng and Josiah Poon. 2008. A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian Data Mining Conference-Volume 87. Australian Computer Society, Inc., 27--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  232. Gang Wu and Edward Y. Chang. 2003. Class-boundary alignment for imbalanced dataset learning. In ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC. 49--56.Google ScholarGoogle Scholar
  233. Gang Wu and Edward Y. Chang. 2005. KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. Knowl. Data Eng. 17, 6 (2005), 786--795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  234. Shaomin Wu, Peter Flach, and César Ferri. 2007. An improved model selection heuristic for AUC. In ECML. Springer, 478--489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  235. Jin Xiao, Ling Xie, Changzheng He, and Xiaoyi Jiang. 2012. Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Syst. Appl. 39, 3 (2012), 3668--3675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  236. Li Xuan, Chen Zhigang, and Yang Fan. 2013. Exploring of clustering algorithm on class-imbalanced data. In 2013 8th International Conference on Computer Science & Education (ICCSE). IEEE, 89--93.Google ScholarGoogle ScholarCross RefCross Ref
  237. Zeping Yang and Daqi Gao. 2012. An active under-sampling approach for imbalanced data classification. In 2012 Fifth International Symposium on Computational Intelligence and Design (ISCID). Vol. 2. IEEE, 270--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  238. Show-Jane Yen and Yue-Shi Lee. 2006. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation. Springer, 731--740.Google ScholarGoogle Scholar
  239. Show-Jane Yen and Yue-Shi Lee. 2009. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36, 3 (2009), 5718--5727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  240. Yang Yong. 2012. The research of imbalanced data set of sample sampling method based on K-means cluster and genetic algorithm. Energy Procedia 17 (2012), 164--170.Google ScholarGoogle ScholarCross RefCross Ref
  241. Kihoon Yoon and Stephen Kwek. 2005. An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In Fifth International Conference on Hybrid Intelligent Systems, 2005. HIS'05. IEEE, 6--pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  242. Dai Yuanhong, Chen Hongchang, and Peng Tao. 2009. Cost-sensitive support vector machine based on weighted attribute. In International Forum on Information Technology and Applications, 2009. IFITA'09, Vol. 1. IEEE, 690--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  243. Bianca Zadrozny, John Langford, and Naoki Abe. 2003. Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining, 2003. ICDM 2003. IEEE, 435--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  244. Arnold Zellner. 1986. Bayesian estimation and prediction using asymmetric loss functions. J. Am. Statist. Assoc. 81, 394 (1986), 446--451.Google ScholarGoogle ScholarCross RefCross Ref
  245. Dongmei Zhang, Wei Liu, Xiaosheng Gong, and Hui Jin. 2011. A novel improved SMOTE resampling algorithm based on fractal. J. Comput. Inform. Syst. 7, 6 (2011), 2204--2211.Google ScholarGoogle Scholar
  246. Huaxiang Zhang and Mingfang Li. 2014. RWO-sampling: A random walk over-sampling approach to imbalanced data classification. Inform. Fus. 20 (2014), 99--116.Google ScholarGoogle ScholarCross RefCross Ref
  247. Huimin Zhao, Atish P. Sinha, and Gaurav Bansal. 2011. An extended tuning method for cost-sensitive regression and forecasting. Dec. Support Syst. 51, 3 (2011), 372--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  248. Zhaohui Zheng, Xiaoyun Wu, and Rohini Srihari. 2004. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor. Newslett. 6, 1 (2004), 80--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  249. Zhi-Hua Zhou and Xu-Ying Liu. 2006. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 1 (2006), 63--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  250. Jingbo Zhu and Eduard H. Hovy. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In EMNLP-CoNLL, Vol. 7. 783--790.Google ScholarGoogle Scholar
  251. Ling Zhuang and Honghua Dai. 2006a. Parameter estimation of one-class SVM on imbalance text classification. In Advances in Artificial Intelligence. Springer, 538--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  252. Ling Zhuang and Honghua Dai. 2006b. Parameter optimization of kernel-based one-class classifier on imbalance learning. J. Comput. 1, 7 (2006), 32--40.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Survey of Predictive Modeling on Imbalanced Domains

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 49, Issue 2
      June 2017
      747 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/2966278
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 August 2016
      • Revised: 1 March 2016
      • Accepted: 1 March 2016
      • Received: 1 May 2015
      Published in csur Volume 49, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader