skip to main content
10.1145/2797143.2797155acmotherconferencesArticle/Chapter ViewAbstractPublication PageseannConference Proceedingsconference-collections
research-article

A Novel Machine Learning Data Preprocessing Method for Enhancing Classification Algorithms Performance

Authors Info & Claims
Published:25 September 2015Publication History

ABSTRACT

Data preprocessing describes any type of processing methods performed on raw data to prepare it for another processing procedure. Commonly used as a preliminary data mining practice, data preprocessing methods transforms the data into a format that will be more easily and effectively processed for the classification algorithms. In this paper, a novel data preprocessing method is proposed and evaluated in three difficult classification data sets of the well known UCI Repository, in which various classifiers have average performance lower than 75%. The three UCI repository datasets that have been used are the Mammographic masses, Indian Liver and Contraceptive Method. The performance of our proposed data preprocessing method and Principal Component Analysis preprocessing method was evaluated using the 10-fold cross validation method assessing five classification algorithms, Nearest-neighbour classifier (IB1), C4.5 algorithm implementation (J48), Random Forest, Multilayer Perceptron and Rotation Forest, respectively. The classification results are presented and compared analytically. The results indicate that the generated features after our proposed preprocessing method implementation to the original dataset markedly improve the performance of the classification algorithms.

References

  1. I. H. Witten, E. Frank, M. Hall, A. Mark, Data Mining: Practical Machine Learning Tools and Techniques (3 ed.), Elsevier, 2011, ISBN 978-0-12-374856-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. B. Kotsiantis, D. Kanellopoulos, and P. E. Pintelas, "Data Preprocessing for Supervised Leaning", World Academy of Science, Engineering and Technology, vol. 1, 2007, pp. 856--861.Google ScholarGoogle Scholar
  3. https://archive.ics.uci.edu/ml/about.html, {Accessed 10 May 2015}.Google ScholarGoogle Scholar
  4. https://archive.ics.uci.edu/ml/datasets/Mammographic+Mass. {Accessed 10 May 2015}.Google ScholarGoogle Scholar
  5. https://archive.ics.uci.edu/ml/datasets/ILPD+%28Indian+Liver+Patient+Dataset%29, {Accessed 10 May 2015}.Google ScholarGoogle Scholar
  6. https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice, {Accessed 10 May 2015}.Google ScholarGoogle Scholar
  7. R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation and model selection". Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2, no. 12, 1995, pp. 1137--1143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Waikato Environment for Knowledge Analysis, Data Mining Software in Java, available online: http://www.cs.waikato.ac.nz/ml/index.html, {Accessed 10 May 2015}.Google ScholarGoogle Scholar

Index Terms

  1. A Novel Machine Learning Data Preprocessing Method for Enhancing Classification Algorithms Performance

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)
            September 2015
            266 pages
            ISBN:9781450335805
            DOI:10.1145/2797143

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 September 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            EANN '15 Paper Acceptance Rate36of60submissions,60%Overall Acceptance Rate36of60submissions,60%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader