skip to main content
10.1145/1389095.1389362acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

An efficient SVM-GA feature selection model for large healthcare databases

Published: 12 July 2008 Publication History

Abstract

This paper presents an efficient hybrid feature selection model based on Support Vector Machine (SVM) and Genetic Algorithm (GA) for large healthcare databases. Even though SVM and GA are robust computational paradigms, the combined iterative nature of a SVM-GA hybrid system makes runtime costs infeasible when using large databases. This paper utilizes hierarchical clustering to reduce dataset size and SVM training time, multi-resolution parameter search for efficient SVM model selection, and chromosome caching to avoid redundant fitness evaluations. This approach significantly reduces runtime and improves classification performance.

References

[1]
DeVol, R. and Bedroussian, A. 2007. An unhealthy america: the economic burden of chronic disease, Miliken Institute, Santa Monica, California.
[2]
Vapnik, V. 1998. Statistical learning theory, John Wiley&Sons, Inc., New York.
[3]
Burgess, C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, Kluwer Academic Pub., Boston, 121--167.
[4]
Goldberg, D.E. 1989. Genetic algorithms in search, optimization and machine learning, Kluwer Academic Publishers, Boston, MA.
[5]
Hsu, C.W., Chang, C.C., and Lin, C.J. 2007. A practical guide to support vector classification. Technical report, Dept. of Comp. Sci. & Info. Engr., National Taiwan University.
[6]
Fang, K.T., Shiu, W.C., and Pan, J.X. 1999. Uniform designs based on Latin squares. Statistica Sinca, 9, 905--912.
[7]
Morariu D., Vintan L. Tresp V. 2006. Evolutionary feature selection for text documents using the svm. Proceedings of the 3rd International Conference on Neural Networks and Pattern Recognition, NNPR06, Barcelona, October, 2006.
[8]
Huerta E.B., Deval, B., and Hao, J. 2006. A hybrid GA/SVM approach for gene Selection and classification of microarray data. EvoWorkshop 2006, LNCS 3907, 34--44.
[9]
Li, L., Jiang, W., Li X., Moser, K.L., Guo, Z., Du, Lei, Wang, Q., Topol, E.J., Wang Q., and Rao, S. 2005. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics 85, Elsevier, 16--23.
[10]
Liu J.J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., and Ling X.B., Multiclass cancer classification and biomarker discovery using GA-based algorithms. J. Bioinformatics, 21(11), 2691--2697.
[11]
Zhao, X., Huang, De, Cheung, Y., Wang, H., and Huang X. 2004. A novel hybrid GA/SVM system for protein sequences classification. IDEAL 2004, LNCS, 3177, 11--16.
[12]
Agrawal, R.K. and Bala, R. 2007. A hybrid approach for selection of relevant features for microarray datasets. Intl. J. Computer and Information Science and Engineering, 1(4), 196--202.
[13]
Bao, Y. and Liu, Z. 2006. A fast grid search method in support vector regression forecasting time series, LNCS, 4224, 504--511.
[14]
Lessmann, S., Stahlbock, R., Crone, S. 2005. Optimizing hyperparameters of support vector machines by genetic algorithms. Proceedings of the International Conference on Artificial Intelligence, ICAI'05, Las Vegas, CSREA Press: Athens, Vol. 1, pp. 74--80.
[15]
Kratica, J. 1999. Improving Performances of the Genetic Algorithm by Caching. Computers and Artificial Intelligence, 18(3), 271--283.
[16]
Cortes, C. and Vapnik V. 1995. Support-vector network. Machine Learning, 20 (Sep. 1995), 273--297.
[17]
Jain, A.K., Murty M. N. and Flynn P. J. 1999. Data clustering: a review. ACM Computing Surveys, 31, 264--323.
[18]
Huang, C. & Lee, Y., Lin, D., and Huang, S. 2007. Model selection for support vector machines via uniform design. Comput. Stat. Data An., 52(1), (Sep. 2007), Elsevier, 335--346.
[19]
Singhal, A. 2001. Modern information retrieval: a brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24 (4), 35--43.
[20]
Dowell, M., Rozell, B., Roth, D., Delugach, H., Chaloux, P. and Dowell, J. 2004. Economic and Clinical Disparities in Hospitalized Patients with Type-2 Diabetes. Journal of Nursing Scholarship, 36, 66--72.
[21]
Kiyota, Y., Schneeweiss, S., Glynn, R., Cannuscio, C., Avorn, R., & Solomon, D. 2004. Accuracy of medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records. American Health Journal, 148(1), 99--104.
[22]
Miller, B.L. and Goldberg, D.E. 1995 Genetic algorithms, tournament selection, and the effects of noise. Complex Systems (June 1995), 193--212.

Cited By

View all
  • (2018)One Class Genetic-Based Feature Selection for Classification in Large DatasetsBig Data, Cloud and Applications10.1007/978-3-319-96292-4_24(301-311)Online publication date: 14-Aug-2018
  • (2017)Sparse autoencoder based feature learning for unmanned aerial vehicle landforms image classification2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM)10.1109/ICCIS.2017.8274739(1-6)Online publication date: Nov-2017
  • (2016)Gait and Posture Analysis Method Based on Genetic Algorithm and Support Vector Machines with Acceleration DataJournal of Robotics and Mechatronics10.20965/jrm.2016.p041828:3(418-424)Online publication date: 17-Jun-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation
July 2008
1814 pages
ISBN:9781605581309
DOI:10.1145/1389095
  • Conference Chair:
  • Conor Ryan,
  • Editor:
  • Maarten Keijzer
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classifier systems
  2. data mining
  3. genetic algorithms
  4. machine learning
  5. optimization
  6. parameter tuning
  7. support vector machines.

Qualifiers

  • Research-article

Conference

GECCO08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)One Class Genetic-Based Feature Selection for Classification in Large DatasetsBig Data, Cloud and Applications10.1007/978-3-319-96292-4_24(301-311)Online publication date: 14-Aug-2018
  • (2017)Sparse autoencoder based feature learning for unmanned aerial vehicle landforms image classification2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM)10.1109/ICCIS.2017.8274739(1-6)Online publication date: Nov-2017
  • (2016)Gait and Posture Analysis Method Based on Genetic Algorithm and Support Vector Machines with Acceleration DataJournal of Robotics and Mechatronics10.20965/jrm.2016.p041828:3(418-424)Online publication date: 17-Jun-2016
  • (2014)A biological continuum based approach for efficient clinical classificationJournal of Biomedical Informatics10.1016/j.jbi.2013.09.00247:C(28-38)Online publication date: 1-Feb-2014
  • (2009)Improved Prediction of Protein Binding Sites from Sequences Using Genetic AlgorithmThe Protein Journal10.1007/s10930-009-9192-128:6(273-280)Online publication date: 24-Jul-2009
  • (2009)Detection of Masses in Mammographic Images Using Simpson's Diversity Index in Circular Regions and SVMProceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-03070-3_41(540-553)Online publication date: 21-Jul-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media