research-article

A Novel Machine Learning Data Preprocessing Method for Enhancing Classification Algorithms Performance

Authors:
Theodoros Iliou

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036624

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036624
View Profile

,
Christos-Nikolaos Anagnostopoulos

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036624

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036624
View Profile

,
Marina Nerantzaki

Democritus University of Thrace Medical School, 8100, Alexandroupolis, Greece, Phone: +302551030503

Democritus University of Thrace Medical School, 8100, Alexandroupolis, Greece, Phone: +302551030503
View Profile

,
George Anastassopoulos

Democritus University of Thrace Medical School, 8100, Alexandroupolis, Greece, Phone: +302551030503

Democritus University of Thrace Medical School, 8100, Alexandroupolis, Greece, Phone: +302551030503
View Profile

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)September 2015Article No.: 11Pages 1–5https://doi.org/10.1145/2797143.2797155

Published:25 September 2015Publication History

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)

Pages 1–5

ABSTRACT

Data preprocessing describes any type of processing methods performed on raw data to prepare it for another processing procedure. Commonly used as a preliminary data mining practice, data preprocessing methods transforms the data into a format that will be more easily and effectively processed for the classification algorithms. In this paper, a novel data preprocessing method is proposed and evaluated in three difficult classification data sets of the well known UCI Repository, in which various classifiers have average performance lower than 75%. The three UCI repository datasets that have been used are the Mammographic masses, Indian Liver and Contraceptive Method. The performance of our proposed data preprocessing method and Principal Component Analysis preprocessing method was evaluated using the 10-fold cross validation method assessing five classification algorithms, Nearest-neighbour classifier (IB1), C4.5 algorithm implementation (J48), Random Forest, Multilayer Perceptron and Rotation Forest, respectively. The classification results are presented and compared analytically. The results indicate that the generated features after our proposed preprocessing method implementation to the original dataset markedly improve the performance of the classification algorithms.

References

I. H. Witten, E. Frank, M. Hall, A. Mark, Data Mining: Practical Machine Learning Tools and Techniques (3 ed.), Elsevier, 2011, ISBN 978-0-12-374856-0. Google ScholarDigital Library
S. B. Kotsiantis, D. Kanellopoulos, and P. E. Pintelas, "Data Preprocessing for Supervised Leaning", World Academy of Science, Engineering and Technology, vol. 1, 2007, pp. 856--861.Google Scholar
https://archive.ics.uci.edu/ml/about.html, {Accessed 10 May 2015}.Google Scholar
https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass. {Accessed 10 May 2015}.Google Scholar
https://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29, {Accessed 10 May 2015}.Google Scholar
https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice, {Accessed 10 May 2015}.Google Scholar
R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection". Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2, no. 12, 1995, pp. 1137--1143. Google ScholarDigital Library
Waikato Environment for Knowledge Analysis, Data Mining Software in Java, available online: http://www.cs.waikato.ac.nz/ml/index.html, {Accessed 10 May 2015}.Google Scholar

Index Terms

A Novel Machine Learning Data Preprocessing Method for Enhancing Classification Algorithms Performance

Recommendations

A novel data preprocessing method for boosting neural network performance

Data preprocessing methods have been used in Machine Learning classificationproblems, transforming datasets into a proper form in order to boost the classification performance.In thispaper,a novel data preprocessing method is proposed and evaluatedin a ...
Read More
Genetic algorithms in feature and instance selection

Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Read More
Mining of classification patterns in clinical data through data mining algorithms
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Data mining on clinical data is a challenging area in the field of medical research, aiming at predicting and discovering patterns of disease occurrence and prognosis based on detected symptoms and reported health conditions. Data mining is the process ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)
September 2015
266 pages
ISBN:9781450335805
DOI:10.1145/2797143
Editors:
Lazaros Iliadis
Democritus University of Thrace, Greece
,
Chrisina Jane
Coventry University, UK
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data preprocessing
classification algorithms
data mining
machine learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
EANN '15 Paper Acceptance Rate36of60submissions,60%Overall Acceptance Rate36of60submissions,60%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 541
  Total Downloads
- Downloads (Last 12 months)129
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Novel Machine Learning Data Preprocessing Method for Enhancing Classification Algorithms Performance

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel data preprocessing method for boosting neural network performance

Genetic algorithms in feature and instance selection

Mining of classification patterns in clinical data through data mining algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Novel Machine Learning Data Preprocessing Method for Enhancing Classification Algorithms Performance

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)

ABSTRACT

References

Cited By

Index Terms

Recommendations

A novel data preprocessing method for boosting neural network performance

Genetic algorithms in feature and instance selection

Mining of classification patterns in clinical data through data mining algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media