ABSTRACT
Voting ensemble method combines results of single classifiers aiming to offer improved classification performance. However, it is intuitively accepted that the combined classifiers during voting should be both diverse and accurate. In this study, we used the unsupervised method of cluster analysis in four datasets related to medical diagnosis in order to differentiate the single classifiers according to their individual results. Using this information we selected the most accurate among similar classifiers proposing the optimal classifier combination for each dataset. The results show that the estimated combination was actually the best performing during voting training for two of the datasets while in the other two it was one of those that outperformed single classifiers. The proposed methodology is a quick and easy tool for estimating classifier combinations that outperforms the single classifiers during voting.
- Tamvakis, A., Trygonis, V., Miritzis, J., Tsirtsis, G., and Spatharis, S. 2014. Optimizing biodiversity prediction from abiotic parameters. Environmental Modelling & Software 53 (Mar. 2014), 112--120. DOI= http://dx.doi.org/10.1016/j.envsoft.2013.12.001. Google ScholarDigital Library
- Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. F., and Nielsen, H. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16 (Feb. 2000), 412--424.Google Scholar
- Kononenko, I. 2001. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine 23 (Aug. 2001), 89--109. DOI= http://dx.doi.org/10.1016/S0933-3657(01)00077-X. Google ScholarDigital Library
- Cruz, J. A. and Wishart, D. S. 2006. Applications of machine learning in cancer prediction and prognosis. Cancer Informatics 2, 59--77.Google ScholarCross Ref
- McKinney, B. A., Reif, D. M., Ritchie, M. D., and Moore, J. H. 2006. Machine learning for detecting gene-gene interactions: a review. Applied Bioinformatics 5 (Dec. 2006), 77--88. DOI= http://dx.doi.org/10.2165/00822942-200605020-00002.Google Scholar
- Nanni, L., Lumini, A., and Brahnam, S. 2010. Local binary patterns variants as texture descriptors for medical image analysis. Artificial Intelligence in Medicine 49 (Jun. 2010), 117--125. DOI= http://dx.doi.org/10.1016/j.artmed.2010.02.006. Google ScholarDigital Library
- Sathya, R. and Abraham, A. 2013. Comparison of supervised and unsupervised learning algorithms for pattern classification. International Journal of Advanced Research in Artificial Intelligence 2, 34--38. DOI= http://dx.doi.org/10.14569/IJARAI.2013.020206.Google ScholarCross Ref
- Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E. 2006. Machine learning: a review of classification and combining techniques. Artificial Intelligence Review 26 (Nov 2006), 159--190. DOI= http://dx.doi.org/10.1007/s10462-007-9052-3. Google ScholarDigital Library
- Tsekouras, G. E., Anagnostopoulos, C., Gavalas, D., and Dafhi, E. 2007. Classification of Web documents using fuzzy logic categorical data clustering. In Artificial Intelligence and Innovations: from Theory to Applications, C. Boukis, A. Pnevmatikakis, L. Polymenakos, Eds. Springer US, 93--100. DOI= http://dx.doi.org/10.1007/978-0-387-74161-1_11.Google Scholar
- Dietterich, T. 2000. Ensemble Methods in Machine Learning. In Multiple Classifier Systems, Springer Berlin Heidelberg, 1--15. Google ScholarDigital Library
- Tanwani, A. K., Afridi, J., Shafiq, M. Z., and Farooq, M. 2009. Guidelines to select machine learning scheme for classification of biomedical datasets. In Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, C. Pizzuti, M. Ritchie, M. Giacobini, Eds. Springer Berlin Heidelberg, 128--139. DOI= http://dx.doi.org/10.1007/978-3-642-01184-9_12. Google ScholarDigital Library
- Peng, C. R., Liu, L., Niu, B., Lv, Y. L., Li, M. J., Yuan, Y. L., Zhu, Y. B., Lu, W. C., and Cai, Y. D. 2011. Prediction of RNA-binding proteins by voting systems. Journal of Biomedicine and Biotechnology 2011, 506205, DOI= http://dx.doi.org/10.1155/2011/506205.Google ScholarCross Ref
- Huang, C. H., Peng, H. S., and Ng, K. L. 2015. Prediction of cancer proteins by integrating protein interaction, domain frequency and domain interaction data using machine learning algorithms. BioMed Research International 2015, 312047, DOI= http://dx.doi.org/10.1155/2015/312047.Google Scholar
- Ruta, D. and Gabrys, B. 2005.Classifier selection for majority voting. Information Fusion 6 (Mar. 2005), 63--81.Google Scholar
- Tan, A. C. and Gilbert, D. 2003. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2, S75--S83.Google Scholar
- Kuncheva, L. I. and Hadjitodorov, S.T. 2004. Using diversity in cluster ensembles. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, 1214--1219. DOI= http://dx.doi.org/10.1109/ICSMC.2004.1399790.Google Scholar
- Zhou, Z. H., Wu, J., and Tang, W. 2002. Ensembling neural networks: Many could be better than all. Artificial Intelligence 137 (May 2002), 239--263. Google ScholarDigital Library
- Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in data: An introduction to cluster analysis. Wiley, New York.Google Scholar
- Ma, L. Y., Chan, P., Gu, Z. Q., Li, F. F., and Feng, T. 2015. Heterogeneity among patients with Parkinson's disease: Cluster analysis and genetic association. Journal of the Neurological Sciences 351 (Apr 2015), 41--45. DOI= http://dx.doi.org/10.1016/j.jns.2015.02.029.Google ScholarCross Ref
- Chen, L., Lin, Z. X., Lin, G. S., Zhou, C. F., Chen, Y. P., Wang, X. F., and Zheng, Z. Q. 2015. Classification of microvascular patterns via cluster analysis reveals their prognostic significance in glioblastoma. Human Pathology 46 (Jan 2015), 120--128. DOI= http://dx.doi.org/10.1016/j.humpath.2014.10.002.Google Scholar
- Dimitriadou, E., Weingessel, A., and Hornik, K. 2001. Voting-Merging: an ensemble method for clustering. In Artificial Neural Networks - ICANN 2001, G. Dorffner, H. Bischof, K. Hornik, Eds. Springer Berlin Heidelberg, 217--224. Google ScholarDigital Library
- Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19, 1090--1099. DOI= http://dx.doi.org/10.1093/bioinformatics/btg038.Google ScholarCross Ref
- Iliou, T., Anagnostopoulos, C. N., Stephanakis, I., and Anastassopoulos, G. 2013. Combined classification of risk factors for appendicitis prediction in childhood. In Engineering Applications of Neural Networks, L. Iliadis, H. Papadopoulos, C. Jayne, Eds. Springer Berlin Heidelber, 203--211. DOI= http://dx.doi.org/10.1007/978-3-642-41016-1_22.Google Scholar
- Kuncheva, L. I. 2004. Combining pattern classifiers: methods and algorithms. John Wiley & Sons Inc., Hoboken, New Jersey. Google ScholarDigital Library
- Hall, M., Frank, E., Holmes, G., Pfahringer B., Reutemann P., and Witten I.H. 2009. The WEKA Data Mining Software: an update. ACM SIGKDD Exlporations 11 (Jun 2009), 10--18. DOI= http://doi.acm.org/10.1145/1656274.1656278. Google ScholarDigital Library
- Michaud, P. 1997. Clustering techniques. Future Generation Computer Systems 13 (Nov 1997), 135--147. DOI= http://dx.doi.org/10.1016/S0167-739X(97)00017-4. Google ScholarDigital Library
- IBM Corp. Released. 2011. IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY, IBM Corp.Google Scholar
- Kittler, J., Hatef, M., Duin, R. W. D., and Matas, J. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (Mar 1998), 226--239. DOI= http://dx.doi.org/10.1109/34.667881. Google ScholarDigital Library
- Canuto, A. M. P., Abreu, M. C. C., de Melo Oliveira, L., Xavier, J., and Santos, A. d. 2007. Investigating the influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognition Letters 28 (Mar 2007), 472--486. DOI = http://dx.doi.org/10.1016/j.patrec.2006.09.001. Google ScholarDigital Library
- Chou, J. S., Tsai, C. F., Pham, A. D., and Lu, Y. H. 2014. Machine learning in concrete strength simulations: Multination data analytics. Construction and Building Materials 73 (Dec 2014), 771--780. DOI= http://dx.doi.org/10.1016/j.conbuildmat.2014.09.054.Google Scholar
- Shipp, C. A., and Kuncheva, L. I. 2002. Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion 3 (Jun 2002), 135--148. DOI= http://dx.doi.org/10.1016/S1566-2535(02)00051-9.Google Scholar
- Fraley, C., and Rartery, A. E. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 41, 578--588. DOI= http://dx.doi.org/10.1093/comjnl/41.8.578.Google ScholarCross Ref
Index Terms
- Optimizing voting classification using cluster analysis on medical diagnosis data
Recommendations
An ensemble of decision cluster crotches for classification of high dimensional data
This paper presents a Crotch Ensemble classification model for high dimensional data. A Crotch Ensemble is obtained from a decision cluster tree built by calling a clustering algorithm recursively. A crotch is an inner node of the tree together with its ...
Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis
Highlights- Propose AdaC-TANBN algorithm for imbalanced data in medical diagnosis.
- Use ...
AbstractFor the imbalanced classification problems, most traditional classification models only focus on searching for an excellent classifier to maximize classification accuracy with the fixed misclassification cost, not take into ...
Medical Data Classification Using Binary Brain Storm Optimization Algorithm
AIRC '19: Proceedings of the 2019 International Conference on Artificial Intelligence, Robotics and ControlWith the growing access to technology in the medical domain, an increased volume of medical data is recorded. The size and complexity of these data make the process of analysis of meaningful discoveries of beneficial patterns more challenging. This ...
Comments