research-article

MDS: a novel method for class imbalance learning

Authors:

Long-Sheng Chen,

Yu-Shan ChangAuthors Info & Claims

ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication

Pages 544 - 549

https://doi.org/10.1145/1516241.1516336

Published: 15 February 2009 Publication History

Abstract

Lots of real-world data sets have imbalanced class distributions in which almost all examples belong to one class and far fewer instances belong to others. Compared with the majority examples, the minority examples are usually more interesting class, such as rare diseases in diagnosis data, failures in inspection data, frauds in credit screening data, and so on. A classifier induced from an imbalanced data set has high classification accuracy for the majority class, but an unacceptable error rate for the minority class. This situation is called class imbalance problem and has attracted lots of attentions of researchers in data mining area. To solve this problem, this work proposed a novel method, called Mahalanobis Distance based sampling (MDS) methodology. Experimental results indicated the proposed MDS have a better performance in identifying the minority class compared with traditional techniques, under-sampling, cost-adjusting, and cluster based sampling.

References

[1]

Berry, M. and Linoff, G., 1997. Data Mining Techniques: Fro Marketing, Sales, and Customer Support. New York: John Wiley and Sons.

Digital Library

[2]

Desai, V. S., Crook, J. N., and Overstreet, G. A., 1996. A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operation Research. 95, 24--37.

[3]

Su, C.-T. and Hsiao, Y.-H., 2007. An Evaluation of the Robustness of MTS for Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering. 19, 10(October 2007), 1321--1332.

Digital Library

[4]

Weiss, G. M., 2004. Mining with rarity: a unifying framework, SIGKDD Exploration. 6, 1, 7--19.

Digital Library

[5]

Chen, M.-C., Chen, L.-S., C.-C., Hsu, and Zeng, W.-R., 2008. An information granulation based data mining approach for classifying imbalanced data. Information Sciences. 178, 16, 3214--3227.

Digital Library

[6]

Su, C.-T., Chen, L.-S. and Yih, Y., 2006a. Knowledge acquisition through information granulation for imbalanced data. Expert System with Applications, 31, 3, 531--541.

[7]

Su, C.-T., Chen, L.-S., and Chiang, T.-L., 2006b. A neural network based information granulation approach to shorten the cellular phone test process. Computers In Industry, 57, 5, 412--423.

Digital Library

[8]

Xie, J. G., and Qiu, Z. D., 2007. The effect of imbalanced data sets on LDA: a theoretical and empirical analysis. Pattern Recognition. 40, 2, 557--562.

Digital Library

[9]

Altincay, H. and Ergun, C., 2004. Clustering based under-sampling for improving speaker verification decisions using AdaBoost. Lecture Notes in Computer Science. 3138, 698--706.

[10]

Weiss G. M., and Provost F., 2001. The Effect of Class Distribution on Classifier Learning. Technical Report ML-TR-43, Department of Computer Science, Rutgers University.

[11]

Manevitz, L. M. and Yousef, M., 2001, One-class SVMs fro document classification, Journal of Machine Learning Research, 2, pp. 139--154.

Digital Library

[12]

Press, S. J., and Wilson, S., 1978. Chossing between logistic regression and discriminant analysis. Journal of the American Statistical Association, 699--705.

[13]

Desai, V. S., Crook, J. N., and Overstreet, G. A., 1996. A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operation Research, 95, 24--37.

[14]

Liao, T. W, 2008. Classification of weld flaws with imbalanced class data. Expert Systems with Applications. 35, 3, 1041--1052.

Digital Library

[15]

Quinlan, J. R., 1993. C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, CA.

Digital Library

[16]

Quinlan, J. R., 1986. Induction of decision tree. Machine Learning. 1, 1, 88--106.

Digital Library

[17]

Batista, G., Prati, R. C., and Monard, M. C., 2004. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations. 6, 1, 20--29.

Digital Library

[18]

Estabrooks, A, Jo, T. and Japkowicz, N., 2004. A multiple resampling methods for learning from imbalanced data sets. Computational Intelligence. 20, 1, 18--36.

[19]

Guo, H. and Viktor, H. L., 2004, Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explorations. 6, 1, 30--39.

Digital Library

[20]

Provost, F. and Fawcett, T., 2001, Robust classification for imprecise environments, Machine Learning, 42, pp. 203--231.

Digital Library

[21]

Radivojac, P., N. C. Chawla, A. K. Dunker and Z. Obradovic, 2004. Classification and knowledge discovery in protein databases. Journal of Biomedical Informatics. 37, 224--239.

Digital Library

Cited By

Rarnachandra SChen MSocha D(2016)Laughter detection using data mining and human feedback2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS.2016.7883009(25-28)Online publication date: Aug-2016
https://doi.org/10.1109/ICSESS.2016.7883009
Guan SChen MHa HChen SShyu MZhang C(2015)Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data ClassificationProceedings of the 2015 IEEE Conference on Collaboration and Internet Computing (CIC)10.1109/CIC.2015.40(288-295)Online publication date: 27-Oct-2015
https://dl.acm.org/doi/10.1109/CIC.2015.40
Weiguo DLi WYiyang WZhong Q(2012)An Improved SVM-KM Model for Imbalanced DatasetsProceedings of the 2012 International Conference on Industrial Control and Electronics Engineering10.1109/ICICEE.2012.35(100-103)Online publication date: 23-Aug-2012
https://dl.acm.org/doi/10.1109/ICICEE.2012.35
Show More Cited By

Index Terms

MDS: a novel method for class imbalance learning

Recommendations

Over-sampling via under-sampling in strongly imbalanced data

Classification of imbalanced datasets is an important challenge in machine learning. This investigation analysed the effect of ratio imbalance and the selected classifier on the application of several re-sampling strategies to deal with imbalanced ...
An Evaluation of the Robustness of MTS for Imbalanced Data

In classification problems, class imbalance problem will cause bias on the training of classifiers, and will result in the lower sensitivity of detecting the minority class examples. Mahalabobis-Taguchi System (MTS) is a diagnosis and forecasting ...
Multi-granularity relabeled under-sampling algorithm for imbalanced data
Abstract
The imbalanced classification problem turns out to be one of the important and challenging problems in data mining and machine learning. The performances of traditional classifiers will be severely affected by many data problems, such ...
Highlights
- The proposed MGRU overcomes the problem that the UCBSS algorithm presented in [26] cannot be accommodated for real data sets.

Reviews

Reviewer: Jan De Beule

The class imbalance problem refers to the fact that, in real-world data, there is often a majority class of examples and a minority class of examples. Such a dataset is called an imbalanced dataset. A classifier induced from such a set has high classification accuracy for majority examples, but also a high error rate for minority examples. Solving a typical imbalance problem is done by either an algorithm/model-oriented approach or by data manipulation techniques. This paper discusses a novel approach to tackle the imbalanced data problem. The proposed method, Mahalanobis distance-based sampling (MDS), is very technical and is clearly explained in the paper. Chen, Hsu, and Chang compare their method with existing ones, and conclude that their method can drastically improve the classification ability for imbalanced data. However, certain details should be further investigated and should motivate future research. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication

February 2009

704 pages

ISBN:9781605584058

DOI:10.1145/1516241

General Chairs:
Won Kim
Sungkyunkwan University, Korea
,
Hyung Jin Choi
Sungkyunkwan University, Korea
,
Dongho Won
SungKyunkwan University, Korea

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Council Taiwan

Conference

ICUIMC '09

Sponsor:

SIGKDD

ICUIMC '09: The 3rd International Conference on Ubiquitous Information Management and Communication

January 15 - 16, 2009

Suwon, Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
337
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rarnachandra SChen MSocha D(2016)Laughter detection using data mining and human feedback2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS.2016.7883009(25-28)Online publication date: Aug-2016
https://doi.org/10.1109/ICSESS.2016.7883009
Guan SChen MHa HChen SShyu MZhang C(2015)Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data ClassificationProceedings of the 2015 IEEE Conference on Collaboration and Internet Computing (CIC)10.1109/CIC.2015.40(288-295)Online publication date: 27-Oct-2015
https://dl.acm.org/doi/10.1109/CIC.2015.40
Weiguo DLi WYiyang WZhong Q(2012)An Improved SVM-KM Model for Imbalanced DatasetsProceedings of the 2012 International Conference on Industrial Control and Electronics Engineering10.1109/ICICEE.2012.35(100-103)Online publication date: 23-Aug-2012
https://dl.acm.org/doi/10.1109/ICICEE.2012.35
Ziti Fariha Mohd Apandi Mustapha NAffendey L(2011)Evaluating Integrated Weight Linear method to class imbalanced learning in video data2011 3rd Conference on Data Mining and Optimization (DMO)10.1109/DMO.2011.5976535(243-247)Online publication date: Jun-2011
https://doi.org/10.1109/DMO.2011.5976535

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten