skip to main content
10.1145/2797143.2797156acmotherconferencesArticle/Chapter ViewAbstractPublication PageseannConference Proceedingsconference-collections
research-article

Optimizing voting classification using cluster analysis on medical diagnosis data

Authors Info & Claims
Published:25 September 2015Publication History

ABSTRACT

Voting ensemble method combines results of single classifiers aiming to offer improved classification performance. However, it is intuitively accepted that the combined classifiers during voting should be both diverse and accurate. In this study, we used the unsupervised method of cluster analysis in four datasets related to medical diagnosis in order to differentiate the single classifiers according to their individual results. Using this information we selected the most accurate among similar classifiers proposing the optimal classifier combination for each dataset. The results show that the estimated combination was actually the best performing during voting training for two of the datasets while in the other two it was one of those that outperformed single classifiers. The proposed methodology is a quick and easy tool for estimating classifier combinations that outperforms the single classifiers during voting.

References

  1. Tamvakis, A., Trygonis, V., Miritzis, J., Tsirtsis, G., and Spatharis, S. 2014. Optimizing biodiversity prediction from abiotic parameters. Environmental Modelling & Software 53 (Mar. 2014), 112--120. DOI= http://dx.doi.org/10.1016/j.envsoft.2013.12.001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. F., and Nielsen, H. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16 (Feb. 2000), 412--424.Google ScholarGoogle Scholar
  3. Kononenko, I. 2001. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine 23 (Aug. 2001), 89--109. DOI= http://dx.doi.org/10.1016/S0933-3657(01)00077-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cruz, J. A. and Wishart, D. S. 2006. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics 2, 59--77.Google ScholarGoogle ScholarCross RefCross Ref
  5. McKinney, B. A., Reif, D. M., Ritchie, M. D., and Moore, J. H. 2006. Machine learning for detecting gene-gene interactions: a review. Applied Bioinformatics 5 (Dec. 2006), 77--88. DOI= http://dx.doi.org/10.2165/00822942-200605020-00002.Google ScholarGoogle Scholar
  6. Nanni, L., Lumini, A., and Brahnam, S. 2010. Local binary patterns variants as texture descriptors for medical image analysis. Artificial Intelligence in Medicine 49 (Jun. 2010), 117--125. DOI= http://dx.doi.org/10.1016/j.artmed.2010.02.006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sathya, R. and Abraham, A. 2013. Comparison of supervised and unsupervised learning algorithms for pattern classification. International Journal of Advanced Research in Artificial Intelligence 2, 34--38. DOI= http://dx.doi.org/10.14569/IJARAI.2013.020206.Google ScholarGoogle ScholarCross RefCross Ref
  8. Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E. 2006. Machine learning: a review of classification and combining techniques. Artificial Intelligence Review 26 (Nov 2006), 159--190. DOI= http://dx.doi.org/10.1007/s10462-007-9052-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tsekouras, G. E., Anagnostopoulos, C., Gavalas, D., and Dafhi, E. 2007. Classification of Web documents using fuzzy logic categorical data clustering. In Artificial Intelligence and Innovations: from Theory to Applications, C. Boukis, A. Pnevmatikakis, L. Polymenakos, Eds. Springer US, 93--100. DOI= http://dx.doi.org/10.1007/978-0-387-74161-1_11.Google ScholarGoogle Scholar
  10. Dietterich, T. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems, Springer Berlin Heidelberg, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tanwani, A. K., Afridi, J., Shafiq, M. Z., and Farooq, M. 2009. Guidelines to select machine learning scheme for classification of biomedical datasets. In Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, C. Pizzuti, M. Ritchie, M. Giacobini, Eds. Springer Berlin Heidelberg, 128--139. DOI= http://dx.doi.org/10.1007/978-3-642-01184-9_12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peng, C. R., Liu, L., Niu, B., Lv, Y. L., Li, M. J., Yuan, Y. L., Zhu, Y. B., Lu, W. C., and Cai, Y. D. 2011. Prediction of RNA-binding proteins by voting systems. Journal of Biomedicine and Biotechnology 2011, 506205, DOI= http://dx.doi.org/10.1155/2011/506205.Google ScholarGoogle ScholarCross RefCross Ref
  13. Huang, C. H., Peng, H. S., and Ng, K. L. 2015. Prediction of cancer proteins by integrating protein interaction, domain frequency and domain interaction data using machine learning algorithms. BioMed Research International 2015, 312047, DOI= http://dx.doi.org/10.1155/2015/312047.Google ScholarGoogle Scholar
  14. Ruta, D. and Gabrys, B. 2005.Classifier selection for majority voting. Information Fusion 6 (Mar. 2005), 63--81.Google ScholarGoogle Scholar
  15. Tan, A. C. and Gilbert, D. 2003. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75--S83.Google ScholarGoogle Scholar
  16. Kuncheva, L. I. and Hadjitodorov, S.T. 2004. Using diversity in cluster ensembles. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, 1214--1219. DOI= http://dx.doi.org/10.1109/ICSMC.2004.1399790.Google ScholarGoogle Scholar
  17. Zhou, Z. H., Wu, J., and Tang, W. 2002. Ensembling neural networks: Many could be better than all. Artificial Intelligence 137 (May 2002), 239--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in data: An introduction to cluster analysis. Wiley, New York.Google ScholarGoogle Scholar
  19. Ma, L. Y., Chan, P., Gu, Z. Q., Li, F. F., and Feng, T. 2015. Heterogeneity among patients with Parkinson's disease: Cluster analysis and genetic association. Journal of the Neurological Sciences 351 (Apr 2015), 41--45. DOI= http://dx.doi.org/10.1016/j.jns.2015.02.029.Google ScholarGoogle ScholarCross RefCross Ref
  20. Chen, L., Lin, Z. X., Lin, G. S., Zhou, C. F., Chen, Y. P., Wang, X. F., and Zheng, Z. Q. 2015. Classification of microvascular patterns via cluster analysis reveals their prognostic significance in glioblastoma. Human Pathology 46 (Jan 2015), 120--128. DOI= http://dx.doi.org/10.1016/j.humpath.2014.10.002.Google ScholarGoogle Scholar
  21. Dimitriadou, E., Weingessel, A., and Hornik, K. 2001. Voting-Merging: an ensemble method for clustering. In Artificial Neural Networks - ICANN 2001, G. Dorffner, H. Bischof, K. Hornik, Eds. Springer Berlin Heidelberg, 217--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090--1099. DOI= http://dx.doi.org/10.1093/bioinformatics/btg038.Google ScholarGoogle ScholarCross RefCross Ref
  23. Iliou, T., Anagnostopoulos, C. N., Stephanakis, I., and Anastassopoulos, G. 2013. Combined classification of risk factors for appendicitis prediction in childhood. In Engineering Applications of Neural Networks, L. Iliadis, H. Papadopoulos, C. Jayne, Eds. Springer Berlin Heidelber, 203--211. DOI= http://dx.doi.org/10.1007/978-3-642-41016-1_22.Google ScholarGoogle Scholar
  24. Kuncheva, L. I. 2004. Combining pattern classifiers: methods and algorithms. John Wiley & Sons Inc., Hoboken, New Jersey. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hall, M., Frank, E., Holmes, G., Pfahringer B., Reutemann P., and Witten I.H. 2009. The WEKA Data Mining Software: an update. ACM SIGKDD Exlporations 11 (Jun 2009), 10--18. DOI= http://doi.acm.org/10.1145/1656274.1656278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michaud, P. 1997. Clustering techniques. Future Generation Computer Systems 13 (Nov 1997), 135--147. DOI= http://dx.doi.org/10.1016/S0167-739X(97)00017-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. IBM Corp. Released. 2011. IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY, IBM Corp.Google ScholarGoogle Scholar
  28. Kittler, J., Hatef, M., Duin, R. W. D., and Matas, J. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (Mar 1998), 226--239. DOI= http://dx.doi.org/10.1109/34.667881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Canuto, A. M. P., Abreu, M. C. C., de Melo Oliveira, L., Xavier, J., and Santos, A. d. 2007. Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognition Letters 28 (Mar 2007), 472--486. DOI = http://dx.doi.org/10.1016/j.patrec.2006.09.001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chou, J. S., Tsai, C. F., Pham, A. D., and Lu, Y. H. 2014. Machine learning in concrete strength simulations: Multination data analytics. Construction and Building Materials 73 (Dec 2014), 771--780. DOI= http://dx.doi.org/10.1016/j.conbuildmat.2014.09.054.Google ScholarGoogle Scholar
  31. Shipp, C. A., and Kuncheva, L. I. 2002. Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion 3 (Jun 2002), 135--148. DOI= http://dx.doi.org/10.1016/S1566-2535(02)00051-9.Google ScholarGoogle Scholar
  32. Fraley, C., and Rartery, A. E. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41, 578--588. DOI= http://dx.doi.org/10.1093/comjnl/41.8.578.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing voting classification using cluster analysis on medical diagnosis data

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)
                September 2015
                266 pages
                ISBN:9781450335805
                DOI:10.1145/2797143

                Copyright © 2015 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 25 September 2015

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Research
                • Refereed limited

                Acceptance Rates

                EANN '15 Paper Acceptance Rate36of60submissions,60%Overall Acceptance Rate36of60submissions,60%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader