skip to main content
10.1145/1830483.1830514acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

On the use of genetic programming for the prediction of survival in cancer

Published: 07 July 2010 Publication History

Abstract

The classification of cancer patients into risk classes is a very active field of research, with direct clinical applications. We have recently compared several machine learning methods on the well known 70-genes signature dataset. In that study, genetic programming showed promising results, given that it outperformed all the other techniques. Nevertheless, the study was preliminary, mainly because the validation dataset was preprocessed and all its features binarized in order to use logical operators for the genetic programming functional nodes. If this choice allowed simple interpretation of the solutions from the biological viewpoint, on the other hand the binarization of data was limiting, since it amounts to a sizable loss of information. The goal of this paper is to overcome this limitation, using the 70-genes signature dataset with real-valued expression data. The results we present show that genetic programming using the number of incorrectly classified instances as fitness function is not able to outperform the other machine learning methods. However, when a weighted average between false positives and false negatives is used to calculate fitness values, genetic programming obtains performances that are comparable with the other methods in the minimization of incorrectly classified instances and outperforms all the other methods in the minimization of false negatives, which is one of the main goals in breast cancer clinical applications. Also in this case, the solutions returned by genetic programming are simple, easy to understand, and they use a rather limited subset of the available features.

References

[1]
U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumour and normal colon tissues probed by oligonucleotide arrays. In Proc. Nat. Acad. Sci., pages 6745--6750. USA 96, 1999.
[2]
F. Archetti, S. Lanzeni, E. Messina, and L. Vanneschi. Genetic programming for human oral bioavailability of drugs. In M. Cattolico et al., editor, Proceedings of the 8th annual conference on Genetic and Evolutionary Computation, pages 255--262, Seattle, Washington, USA, 2006.
[3]
F. Archetti, E. Messina, S. Lanzeni, and L. Vanneschi. Genetic programming and other machine learning approaches to predict median oral lethal dose (LD50) and plasma protein binding levels (%PPB) of drugs. In E. Marchiori et al., editor, Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. Proceedings of the Fifth European Conference, EvoBIO 2007, Lecture Notes in Computer Science, LNCS 4447, pages 11--23. Springer, Berlin, Heidelberg, New York, 2007.
[4]
F. Archetti, E. Messina, S. Lanzeni, and L. Vanneschi. Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines, 8(4):17--26, 2007.
[5]
C. Bojarczuk, H. Lopes, and A. Freitas. Data mining with constrained-syntax genetic programming: applications to medical data sets. Proceedings Intelligent Data Analysis in Medicine and Pharmacology, 1, 2001.
[6]
Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. In Machine Learning, pages 277--296, 1998.
[7]
N. Friedman, M. Linial, I. Nachmann, and D. Peer. Using bayesian networks to analyze expression data. J. Computational Biology, 7:601--620, 2000.
[8]
D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.
[9]
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389--422, 2002.
[10]
S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice-Hall, London, UK, 1999.
[11]
D. P. Helmbold and M. K. Warmuth. On weak learning. J. Comput. Syst. Sci., 50(3):551--573, 1995.
[12]
J. C. H. Hernandez, B. Duval, and J. Hao. A genetic embedded approach for gene selection and classification of microarray data. Lecture Notes in Computer Science, 4447:90--101, 2007.
[13]
J. H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, Michigan, 1975.
[14]
J. Hong and S. Cho. The classification of cancer based on dna microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med, 36:43--58, 2006.
[15]
A. Hsu, S. Tang, and S. Halgamuge. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics, 19(16):2131--40, 2003.
[16]
J. R. Koza. Genetic Programming. The MIT Press, Cambridge, Massachusetts, 1992.
[17]
J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. Chen, and X.-B. Ling. Multiclass cancer classification and biomarker discovery using ga-based algorithms. Bioinformatics, 21:2691--2697, 2005.
[18]
Y. Lu and J. Han. Cancer classification using gene expression data. Inf. Syst., 28(4):243--268, 2003.
[19]
D. Michie, D. Spiegelhalter, and C. Taylor. Machine learning, neural and statistical classification. Prentice Hall, 1994.
[20]
J. Moore, J. Parker, and L. Hahn. Symbolic discriminant analysis for mining gene expression patterns. Lecture Notes in Artificial Intelligence, 2167:372--381, 2001.
[21]
J. R. Nevins and A. Potti. Mining gene expression profiles: expression signatures as cancer phenotypes. Nat Rev Genet, 8(8):601--609, Aug 2007.
[22]
J. Park and J. W. Sandberg. Universal approximation using radial basis functions network. Neural Computation, 3:246--257, 1991.
[23]
J. Platt. Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods -- Support Vector Learning, 1998.
[24]
T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481--1497, 1990.
[25]
R. Poli, W. B. Langdon, and N. F. McPhee. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, 2008. (With contributions by J. R. Koza).
[26]
M. Rosskopf, H. Schmidt, U. Feldkamp, and W. Banzhaf. Genetic programming based dna microarray analysis for classification of tumour tissues. Technical Report Technical Report 2007-03, Memorial University of Newfoundland, 2007.
[27]
S. Haykin. Neural Networks: a comprehensive foundation. Prentice Hall, London, 1999.
[28]
S. Silva. GPLAB - a genetic programming toolbox for MATLAB, version 3.0, 2007. http://gplab.sourceforge.net.
[29]
M. J. van de Vijver, Y. D. He, L. J. van't Veer, H. Dai, A. A. M. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, and R. Bernards. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med, 347(25):1999--2009, Dec 2002.
[30]
L. J. van 't Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. M. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871):530--536, Jan 2002.
[31]
L. Vanneschi, A. Farinaccio, M. Giacobini, M. Antoniotti, G. Mauri, and P. Provero. Identification of individualized feature combinations for survival prediction in breast cancer: A comparison of machine learning techniques. In M. Giacobini, et al., editors, Proceedings of the EvoBIO 2010 Conference, Springer, LNCS, 2010. To appear.
[32]
V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998.
[33]
Weka. A multi-task machine learning software developed by Waikato University, 2006. See www.cs.waikato.ac.nz/ml/weka.
[34]
J. Yu, J. Yu, A. A. Almal, S. M. Dhanasekaran, D. Ghosh, W. P. Worzel, and A. M. Chinnaiyan. Feature selection and molecular classification of cancer using genetic programming. Neoplasia, 9(4):292--303, 2007.

Cited By

View all
  • (2011)A comparison of machine learning methods for the prediction of breast cancerProceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics10.5555/2008362.2008380(159-170)Online publication date: 27-Apr-2011
  • (2011)A Comparison of Machine Learning Methods for the Prediction of Breast CancerEvolutionary Computation, Machine Learning and Data Mining in Bioinformatics10.1007/978-3-642-20389-3_17(159-170)Online publication date: 2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation
July 2010
1520 pages
ISBN:9781450300728
DOI:10.1145/1830483
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cancer patients
  2. classification
  3. computational biology
  4. genetic programming

Qualifiers

  • Research-article

Conference

GECCO '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2011)A comparison of machine learning methods for the prediction of breast cancerProceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics10.5555/2008362.2008380(159-170)Online publication date: 27-Apr-2011
  • (2011)A Comparison of Machine Learning Methods for the Prediction of Breast CancerEvolutionary Computation, Machine Learning and Data Mining in Bioinformatics10.1007/978-3-642-20389-3_17(159-170)Online publication date: 2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media