ABSTRACT
Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer related cellular processes. Gene expression data is also expected to significantly and in the development of efficient cancer diagnosis and classification platforms. In this work we examine two sets of gene expression data measured across sets of tumor and normal clinical samples One set consists of 2,000 genes, measured in 62 epithelial colon samples [1]. The second consists of ≈ 100,000 clones, measured in 32 ovarian samples (unpublished, extension of data set described in [26]).
We examine the use of scoring methods, measuring separation of tumors from normals using individual gene expression levels. These are then coupled with high dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the two data sets. employing SVM [8], AdaBoost [13] and a novel clustering based classification technique. As tumor samples can differ from normal samples in their cell-type composition we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias.
We demonstrate success rate of at least 90% in tumor vs normal classification, using sets of selected genes, with as well as without cellular contamination related members. These results are insensitive to the exact selection mechanism, over a certain range.
- 1.U. Alan, N. Barkai, D.A Notterman, K. Glsh, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sc~. USA, 96:6745-6750, 1999Google ScholarCross Ref
- 2.A. Ben-Dor, R. Shamir, and g Yakhini. Clustering gene expression patterns. Journal of Computalzonal Bzology, 6:281-297, 1999Google Scholar
- 3.C. M. Bishop. Neural Networks }o,' Pattern Recogmtton Oxford University Press, Oxford, U.K., 1995. Google ScholarDigital Library
- 4.M.P.S. Brown, W.N. Grundy, D. {,in, N. Cnst~anini, C Sugnet, T.S. Furey, M. Ares Jr., and D. Haussler Knowledge-based analysis of microarray gene expression data using support vector machines. Technical Report UCSC-CRL-99-09, U C Santa Cruz, 1999Google Scholar
- 5.C. J. C. Bm'ges A tutorial on Support Vector Machines for pattern recognition. Data M2nmff and Knowledge D~scovery, 2, 121-167, 1998. Google ScholarDigital Library
- 6.S. Chu, J. DeRisi, M. Eisen, J Munholland, D Botstein, P. Brown, and I Herskowltz. The transcriptional program of sporulation in budding yeast Science, 282 699-705, 1998.Google Scholar
- 7.P. A. Clas'ke, M George, D Cmmingham, 1. Swift, and P Workman. Ananlysis of tumor gene expression following chemotherapeutic treatment of patients w~t.h bowel cancer. In Proc. Nature Genet, cs M, croarray Meeting 99, page 39, Scottsdale, Arizona, 1999.Google Scholar
- 8.C Cortes and V Vapnik. Support vector machines. Machine Learning, 20:273--297, 1995. Google ScholarDigital Library
- 9.J. DeRisi., V. I yer, and P. Brown. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 282:699-705, 1997.Google Scholar
- 10.R. O. Duds and P. E. Hart. Pattern Class~ficat, on and Scene Analys~s. John Wiley & Sons, New York, 1973.Google Scholar
- 11.M B. Eisen, P T. Spellman, P.O Brown, and D. Botstem. Cluster analysis and display of genome-wide expression patterns. Proc. Nat. Acad. Sc,. USA, 95:14863-14868, 1998.Google ScholarCross Ref
- 12.B. Eventt Cluster Analysts. Edward Arnold, London, third edition, 1993.Google Scholar
- 13.Y. Freund and R. E. Schapire. A decismn-theoretic generalization of on-line learning and an application to boosting. J. Computer and System Sc,ences, 55:119- 139, 1997. Google ScholarDigital Library
- 14.T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, J.P. Mesirov M. Caasenbeek, H Coller, M.L. Loh, J R. Downing, M.A Cahgiuri, C.D. Bloomfield, and E.S. Lander. Molecular classification of cancer, class ~hscovery and class prediction by gene expression monitoring. $czence, 286.531-537, 1999Google Scholar
- 15.V.R. lyer, M B Eisen, D.T. Ross, G. Schttler, T. Moore, J.C.F. Lee, } M. Trent, L.M. Staudt, J. Hudson, M.S. Boguski, D. Lashkari, D. Shalon, D. Botstem, and P O. Brown. The transcriptional program in the response of human fibroblasts to serum. Sczence, 283:83-87, 1999.Google Scholar
- 16.Kim lab home page. http://cmgm.stanford. edu/-kimlab/.Google Scholar
- 17.J. Khan, R. Simon, M. Bittner, Y. Chen, S. B. Leighton, T. Pohida, P. D Smith, Y. Jiang, G. C. Gooden, J. M. Trent, and P. S. Meltzer. Gene expression profiling of Alveolar rhabdomyosarcoma with eDNA microarrays. Cancer Reasearch, 1998.Google Scholar
- 18.R. Kohavi. A study of cross-vahdation and bootstrap for accuracy estimation and model selection. In Proc. Fourteenth International Joint Conference on A rt~ficzal Intelhgence (IJCAI '95), pages 1137-1143 Morgan Kaufmann, San Francisco, Cahf, 1995. Google ScholarDigital Library
- 19.D. J Lockhart, H Dong, M. C Byrne, M T. Follettie, M. V. Gallo, M. S Chee, M. Mittmann, C. Want, M. Kobayashi, H. Horton, and E. L. Brown. DNA expression momtoring by hybridization of high density oligonucleotide arrays. Nature B~otechnology, 14 1675- 1680, 1996.Google ScholarCross Ref
- 20.L. Mason, P. Bartlett, and J. Baxter. Direct optimization of margins improves generalization in combined classifiers. In Advances in Neural Informatwn Process- ~ng Systems 11. MIT Press, Cambridge, Mass , 1999 Google ScholarDigital Library
- 21.C. M. Perou, S. S. Jeffrey, M. v de Rijn, C. A. Rees, M. B Eisen, D. T. Ross, A. Pergamenschikov, C. F Wilhams, S. X. Zhu, J. C. F. Lee, D Lashkari, D Shalon, P. O. Brown, and Botstein D. D~stinctive gene expression patterns in human mammary epithehal ceils and breast cancers. Proc. Nat. Acad. Sc~. USA, 96:9212-9217, 1999.Google ScholarCross Ref
- 22.B. D Ripley. Pattern Recognzt,on and Neural Networks. Cambridge University Press: 1996. Google ScholarDigital Library
- 23.R. E Schapire. The strength of weak learnability Mach,ne Learning, 5:197-227, 1990. Google ScholarDigital Library
- 24.R E Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin A new explanation for the effectiveness of voting methods. Annals of Stat,st~cs, 26:1651-1686, 1998.Google Scholar
- 25.T H. Schiedeck, S. Christoph, M Duchrow, and H.P Bruch. Detection of hl6-mrna: new posslbdities in serologic tumor diagnosis of colorec~,al carcinomas. Zentralbl Chit, 123(2):159-162, 1998.Google Scholar
- 26.M Schummer, W. NG, R. Bumgarner, P. Nelson, B. Schummer, L. Hassell, L R. Baldwm., B. Karlan, and L. Hood. Comperative hybridization of an array of 21,500 ovcrian cDNAs for the discovery of genes overexpressed in overian carcinomas. Gene, 238:375-385, 1999Google ScholarCross Ref
- 27.J. Swets. Measuring the accuracy of diagnostic systems. Sc,ence, 240:1285-1293, 1988~Google Scholar
- 28.V. Vapnik. $tatistwal Learmn# Theory. John Wiley g~ Sons, New York, 1999.Google Scholar
- 29.X. Wen, S. Furhmann, G. S. Mmheals, D. B. Carr, S. Smith, J L Barker, and R. Somogyl. Largescale temporal gene expression mapping of central nervous system development. Proc. Nat. Acad. Sct. USA, 95.334-339, 1998.Google Scholar
- 30.Y.Y. Xiang, DY Wang, M Tanaka, M. Suzuki, E. Kiyokawa, H. lgarashi, Y. Naito, Q. Shen, and H. Sugimura. Expressmn of high-mobility group-I mrna in human gastrointestinal adenocarcinoma and corresponding non-cancerous mucosa, lnt J. Cancer, 74(1). 1-6, Feb 1997.Google Scholar
- Tissue classification with gene expression profiles
Recommendations
Gene function classification using NCI-60 cell line gene expression profiles
Gene expression patterns from NCI's panel of 60 cell lines were used to train a Neural Network model for classifying genes to pathways. The model assigns probabilities to each gene for each of the 21 modeled pathways assigned by the Kyoto Encyclopedia ...
Classification of Gene Expression Profiles: Comparison of K-means and Expectation Maximization Algorithms
HIS '08: Proceedings of the 2008 8th International Conference on Hybrid Intelligent SystemsBiomedical research has been revolutionized by high-throughput techniques and the enormous amount of bio-logical data they are able to generate. In particular micro-array technology has the capacity to monitor changes in RNA abundance for thousands of ...
Meta-analysis of age-related gene expression profiles identifies common signatures of aging
Motivation: Numerous microarray studies of aging have been conducted, yet given the noisy nature of gene expression changes with age, elucidating the transcriptional features of aging and how these relate to physiological, biochemical and ...
Comments