research-article

Optimizing voting classification using cluster analysis on medical diagnosis data

Authors:
Androniki Tamvakis

University of the Aegean, Department of Marine Sciences, University Hill, Mytilene, Greece, Phone: +302251036811

University of the Aegean, Department of Marine Sciences, University Hill, Mytilene, Greece, Phone: +302251036811
View Profile

,
Christos-Nikolaos Anagnostopoulos

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036624

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036624
View Profile

,
George Tsekouras

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036631

University of the Aegean, Department of Cultural Technology and Communication, University Hill, Mytilene, Greece, Phone: +302251036631
View Profile

,
George Anastassopoulos

Democritus University of Thrace, Medical School, 8100, Alexandroupolis, Greece, Phone: +302551030503

Democritus University of Thrace, Medical School, 8100, Alexandroupolis, Greece, Phone: +302551030503
View Profile

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)September 2015Article No.: 12Pages 1–7https://doi.org/10.1145/2797143.2797156

Published:25 September 2015Publication History

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)

Pages 1–7

ABSTRACT

Voting ensemble method combines results of single classifiers aiming to offer improved classification performance. However, it is intuitively accepted that the combined classifiers during voting should be both diverse and accurate. In this study, we used the unsupervised method of cluster analysis in four datasets related to medical diagnosis in order to differentiate the single classifiers according to their individual results. Using this information we selected the most accurate among similar classifiers proposing the optimal classifier combination for each dataset. The results show that the estimated combination was actually the best performing during voting training for two of the datasets while in the other two it was one of those that outperformed single classifiers. The proposed methodology is a quick and easy tool for estimating classifier combinations that outperforms the single classifiers during voting.

References

Tamvakis, A., Trygonis, V., Miritzis, J., Tsirtsis, G., and Spatharis, S. 2014. Optimizing biodiversity prediction from abiotic parameters. Environmental Modelling & Software 53 (Mar. 2014), 112--120. DOI= http://dx.doi.org/10.1016/j.envsoft.2013.12.001. Google ScholarDigital Library
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. F., and Nielsen, H. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16 (Feb. 2000), 412--424.Google Scholar
Kononenko, I. 2001. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine 23 (Aug. 2001), 89--109. DOI= http://dx.doi.org/10.1016/S0933-3657(01)00077-X. Google ScholarDigital Library
Cruz, J. A. and Wishart, D. S. 2006. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics 2, 59--77.Google ScholarCross Ref
McKinney, B. A., Reif, D. M., Ritchie, M. D., and Moore, J. H. 2006. Machine learning for detecting gene-gene interactions: a review. Applied Bioinformatics 5 (Dec. 2006), 77--88. DOI= http://dx.doi.org/10.2165/00822942-200605020-00002.Google Scholar
Nanni, L., Lumini, A., and Brahnam, S. 2010. Local binary patterns variants as texture descriptors for medical image analysis. Artificial Intelligence in Medicine 49 (Jun. 2010), 117--125. DOI= http://dx.doi.org/10.1016/j.artmed.2010.02.006. Google ScholarDigital Library
Sathya, R. and Abraham, A. 2013. Comparison of supervised and unsupervised learning algorithms for pattern classification. International Journal of Advanced Research in Artificial Intelligence 2, 34--38. DOI= http://dx.doi.org/10.14569/IJARAI.2013.020206.Google ScholarCross Ref
Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E. 2006. Machine learning: a review of classification and combining techniques. Artificial Intelligence Review 26 (Nov 2006), 159--190. DOI= http://dx.doi.org/10.1007/s10462-007-9052-3. Google ScholarDigital Library
Tsekouras, G. E., Anagnostopoulos, C., Gavalas, D., and Dafhi, E. 2007. Classification of Web documents using fuzzy logic categorical data clustering. In Artificial Intelligence and Innovations: from Theory to Applications, C. Boukis, A. Pnevmatikakis, L. Polymenakos, Eds. Springer US, 93--100. DOI= http://dx.doi.org/10.1007/978-0-387-74161-1_11.Google Scholar
Dietterich, T. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems, Springer Berlin Heidelberg, 1--15. Google ScholarDigital Library
Tanwani, A. K., Afridi, J., Shafiq, M. Z., and Farooq, M. 2009. Guidelines to select machine learning scheme for classification of biomedical datasets. In Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, C. Pizzuti, M. Ritchie, M. Giacobini, Eds. Springer Berlin Heidelberg, 128--139. DOI= http://dx.doi.org/10.1007/978-3-642-01184-9_12. Google ScholarDigital Library
Peng, C. R., Liu, L., Niu, B., Lv, Y. L., Li, M. J., Yuan, Y. L., Zhu, Y. B., Lu, W. C., and Cai, Y. D. 2011. Prediction of RNA-binding proteins by voting systems. Journal of Biomedicine and Biotechnology 2011, 506205, DOI= http://dx.doi.org/10.1155/2011/506205.Google ScholarCross Ref
Huang, C. H., Peng, H. S., and Ng, K. L. 2015. Prediction of cancer proteins by integrating protein interaction, domain frequency and domain interaction data using machine learning algorithms. BioMed Research International 2015, 312047, DOI= http://dx.doi.org/10.1155/2015/312047.Google Scholar
Ruta, D. and Gabrys, B. 2005.Classifier selection for majority voting. Information Fusion 6 (Mar. 2005), 63--81.Google Scholar
Tan, A. C. and Gilbert, D. 2003. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75--S83.Google Scholar
Kuncheva, L. I. and Hadjitodorov, S.T. 2004. Using diversity in cluster ensembles. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, 1214--1219. DOI= http://dx.doi.org/10.1109/ICSMC.2004.1399790.Google Scholar
Zhou, Z. H., Wu, J., and Tang, W. 2002. Ensembling neural networks: Many could be better than all. Artificial Intelligence 137 (May 2002), 239--263. Google ScholarDigital Library
Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in data: An introduction to cluster analysis. Wiley, New York.Google Scholar
Ma, L. Y., Chan, P., Gu, Z. Q., Li, F. F., and Feng, T. 2015. Heterogeneity among patients with Parkinson's disease: Cluster analysis and genetic association. Journal of the Neurological Sciences 351 (Apr 2015), 41--45. DOI= http://dx.doi.org/10.1016/j.jns.2015.02.029.Google ScholarCross Ref
Chen, L., Lin, Z. X., Lin, G. S., Zhou, C. F., Chen, Y. P., Wang, X. F., and Zheng, Z. Q. 2015. Classification of microvascular patterns via cluster analysis reveals their prognostic significance in glioblastoma. Human Pathology 46 (Jan 2015), 120--128. DOI= http://dx.doi.org/10.1016/j.humpath.2014.10.002.Google Scholar
Dimitriadou, E., Weingessel, A., and Hornik, K. 2001. Voting-Merging: an ensemble method for clustering. In Artificial Neural Networks - ICANN 2001, G. Dorffner, H. Bischof, K. Hornik, Eds. Springer Berlin Heidelberg, 217--224. Google ScholarDigital Library
Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090--1099. DOI= http://dx.doi.org/10.1093/bioinformatics/btg038.Google ScholarCross Ref
Iliou, T., Anagnostopoulos, C. N., Stephanakis, I., and Anastassopoulos, G. 2013. Combined classification of risk factors for appendicitis prediction in childhood. In Engineering Applications of Neural Networks, L. Iliadis, H. Papadopoulos, C. Jayne, Eds. Springer Berlin Heidelber, 203--211. DOI= http://dx.doi.org/10.1007/978-3-642-41016-1_22.Google Scholar
Kuncheva, L. I. 2004. Combining pattern classifiers: methods and algorithms. John Wiley & Sons Inc., Hoboken, New Jersey. Google ScholarDigital Library
Hall, M., Frank, E., Holmes, G., Pfahringer B., Reutemann P., and Witten I.H. 2009. The WEKA Data Mining Software: an update. ACM SIGKDD Exlporations 11 (Jun 2009), 10--18. DOI= http://doi.acm.org/10.1145/1656274.1656278. Google ScholarDigital Library
Michaud, P. 1997. Clustering techniques. Future Generation Computer Systems 13 (Nov 1997), 135--147. DOI= http://dx.doi.org/10.1016/S0167-739X(97)00017-4. Google ScholarDigital Library
IBM Corp. Released. 2011. IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY, IBM Corp.Google Scholar
Kittler, J., Hatef, M., Duin, R. W. D., and Matas, J. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (Mar 1998), 226--239. DOI= http://dx.doi.org/10.1109/34.667881. Google ScholarDigital Library
Canuto, A. M. P., Abreu, M. C. C., de Melo Oliveira, L., Xavier, J., and Santos, A. d. 2007. Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognition Letters 28 (Mar 2007), 472--486. DOI = http://dx.doi.org/10.1016/j.patrec.2006.09.001. Google ScholarDigital Library
Chou, J. S., Tsai, C. F., Pham, A. D., and Lu, Y. H. 2014. Machine learning in concrete strength simulations: Multination data analytics. Construction and Building Materials 73 (Dec 2014), 771--780. DOI= http://dx.doi.org/10.1016/j.conbuildmat.2014.09.054.Google Scholar
Shipp, C. A., and Kuncheva, L. I. 2002. Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion 3 (Jun 2002), 135--148. DOI= http://dx.doi.org/10.1016/S1566-2535(02)00051-9.Google Scholar
Fraley, C., and Rartery, A. E. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41, 578--588. DOI= http://dx.doi.org/10.1093/comjnl/41.8.578.Google ScholarCross Ref

Index Terms

Optimizing voting classification using cluster analysis on medical diagnosis data

Recommendations

An ensemble of decision cluster crotches for classification of high dimensional data

This paper presents a Crotch Ensemble classification model for high dimensional data. A Crotch Ensemble is obtained from a decision cluster tree built by calling a clustering algorithm recursively. A crotch is an inner node of the tree together with its ...
Read More
Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis
Highlights
- Propose AdaC-TANBN algorithm for imbalanced data in medical diagnosis.
- Use ...
Abstract
For the imbalanced classification problems, most traditional classification models only focus on searching for an excellent classifier to maximize classification accuracy with the fixed misclassification cost, not take into ...
Read More
Medical Data Classification Using Binary Brain Storm Optimization Algorithm
AIRC '19: Proceedings of the 2019 International Conference on Artificial Intelligence, Robotics and Control

With the growing access to technology in the medical domain, an increased volume of medical data is recorded. The size and complexity of these data make the process of analysis of meaningful discoveries of beneficial patterns more challenging. This ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)
September 2015
266 pages
ISBN:9781450335805
DOI:10.1145/2797143
Editors:
Lazaros Iliadis
Democritus University of Thrace, Greece
,
Chrisina Jane
Coventry University, UK
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 September 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Voting ensemble method
classification
cluster analysis
medical diagnosis
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
EANN '15 Paper Acceptance Rate36of60submissions,60%Overall Acceptance Rate36of60submissions,60%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 92
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing voting classification using cluster analysis on medical diagnosis data

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)

ABSTRACT

References

Cited By

Index Terms

Recommendations

An ensemble of decision cluster crotches for classification of high dimensional data

Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis

Medical Data Classification Using Binary Brain Storm Optimization Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimizing voting classification using cluster analysis on medical diagnosis data

EANN '15: Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS)

ABSTRACT

References

Cited By

Index Terms

Recommendations

An ensemble of decision cluster crotches for classification of high dimensional data

Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis

Medical Data Classification Using Binary Brain Storm Optimization Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media