skip to main content
article

Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

Published:01 July 2007Publication History
Skip Abstract Section

Abstract

Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic test. This paper focuses on effective identification of informative genes. We analyze gene selection models from the perspective of optimization theory. As a result, a new strategy is designed to modify conventional search engines. Also, as overfitting is likely to occur in microarray data because of their small sample set, a point injection technique is developed to address the problem of overfitting. The proposed strategies have been evaluated on three kinds of cancer diagnosis. Our results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases.

References

  1. A. Al-Ani and M. Deriche, “Optimal Feature Selection Using Information Maximisation: Case of Biomedical Data,” Proc. 2000 IEEE Signal Processing Soc. Workshop, vol. 2, pp. 841-850, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  2. C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Chen, E.R. Dougherty, and M. Bittner, “Ratio-Based Decision and Quantitative Analysis of cDNA Microarrays,” J. Biomedical Optics, vol. 2, pp. 364-374, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  4. I. Cheng et al., “Common Genetic Variation in IGF1 and Prostate Cancer Risk in the Multiethnic Cohort,” J. Nat'l Cancer Inst., vol. 98, no. 2, pp. 123-124, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  5. M.L. Chow, E.J. Moler, and I.S. Mian, “Identifying Marker Genes in Transcription Profiling Data Using a Mixture of Feature Relevance Experts,” Physiological Genomics, vol. 5, pp. 99-111, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumours Using Gene Express Data,” J. Am. Statistical Assoc., vol. 97, no. 457, pp.77-87, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Ekins and F.W. Chu, “Microarrays: Their Origins and Applications,” Trends in Biotechnology, vol. 17, pp. 217-218, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  8. T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.Google ScholarGoogle Scholar
  9. J.R. Graff et al., “Integrin-linked Kinase Expression Increases with Prostate Tumor Grade,” Clinical Cancer Research, vol. 7, pp. 1987-1991, 2002.Google ScholarGoogle Scholar
  10. I. Guyon, J. Weston, and S. Barnhill, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gui and H. Li, “Penalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Application to Microarray Gene Expression Data,” Bioinformatics, vol. 21, no. 13, pp. 3001-3008, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, pp. 308-312. Springer, 2001.Google ScholarGoogle Scholar
  13. D. Huang and T.W.S. Chow, “Efficiently Searching the Important Input Variables Using Bayesian Discriminant,” IEEE Trans. Circuits and Systems, vol. 52, no. 4, pp. 785-793, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. Huang, T.W.S. Chow, E.W.M. Ma, and J. Li, “Efficient Selection of Salient Features from Microarray Gene Expression Data for Cancer Diagnosis,” IEEE Trans. Circuits and Systems, Part I, vol. 52, no. 9, pp. 1909-1918, 2005Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Kim et al., “Strong Feature Sets From Small Samples,” J.Computational Biology, vol. 9, pp. 127-146, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  16. F. Lampariello and M. Sciandrone, “Efficient Training of RBF Neural Networks for Pattern Recognition,” IEEE Trans. Neural Networks, vol. 12, no. 5, pp. 1235-1242, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K.E. Lee et al., “Gene Selection: A Bayesian Variable Selection Approach,” Bioinformatics, vol. 19, no. 1, pp. 90-97, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  18. W. Li and Y. Yang, “How Many Genes Are Needed for a Discriminant Microarray Data Analysis?” Methods of Microarray Data Analysis, S.M Lin and K.F. Johnson, eds., pp. 137-150, Kluwer Academic, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  19. H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Liu, A. Krishnan, and A. Mondry, “An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data,” BMC Bioinformatics, vol. 6, no. 76, 2005.Google ScholarGoogle Scholar
  21. C. Jerónimo et al., “Aberrant Cellular Retinol Binding Protein 1 (CRBP1) Gene Expression and Promoter Methylation in Prostate Cancer,” J. Clinical Pathology, vol. 57, pp. 872-876, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  22. L.C. Molina, L. Belanche, and A. Nebot, “Feature Selection Algorithms: A Survey and Experimental Evaluation,” technical report, http://www.lsi.upc.es/dept/techreps/html/R02-62.html, 2002.Google ScholarGoogle Scholar
  23. E. Parzen, “On the Estimation of a Probability Density Function and Mode,” Annals Math. Statistics, vol. 33, pp. 1064-1076, 1962.Google ScholarGoogle ScholarCross RefCross Ref
  24. P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S.C. Shah and A. Kusiak, “Data Mining and Genetic Algorithm Based Gene/SNP Selection,” Intelligence in Medicine, vol. 31, no. 3, pp. 183-196, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Sima, U. Braga-Neto, and E.R. Dougherty, “Superior Feature-Set Ranking for Small Samples Using Bolstered Error Estimation,” Bioinformatics, vol. 21, no. 7, pp. 1046-1054, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Skurichina, S. Raudys, and R.P. Duin, “K-Nearest Neighbours Directed Noise Injection in Multilayer Perceptron Training,” IEEE Trans. Neural Networks, vol. 11, no. 2, pp. 504-511, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Su, T.M. Pavlovic, and S. Kasif, “RankGene: Identification of Diagnostic Genes Based on Expression Data,” Bioinformatics, vol. 19, no. 12, pp. 1578-1579, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  30. T.J. Umpai and S. Aitken, “Feature Selection and Classification for Microarray Data Analysis: Evolutionary Methods for Identifying Predictive Genes,” BMC Bioinformatics, vol. 6, no. 148, 2005.Google ScholarGoogle Scholar
  31. S.S. Uzma and H.G. Robert, “Fingerprinting the Diseased Prostate: Associations between BPH and Prostate Cancer,” J. Cellular Biochemistry, vol. 91, pp. 161-169, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  32. E.P. Xing, M.I. Jordan, and M. Karp, “Feature Selection for High-Dimensional Genomic Microarray Data,” Proc. 18th Int'l Conf. Machine Learning, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Yeung, R.E. Bumgarner, and A.E. Raftery, “Bayesian Model Averaging: Development of an Improved Multi-Class, Gene Selection and Classification Tool for Microarray Data,” Bioinformatics, vol. 21, no. 10, pp. 2394-2402, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Zhang et al., “Profiling Alternatively Spliced mRNA Isoforms for Prostate Cancer Classification,” BMC Bioinformatics, vol. 7, pp.202-236, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  35. X. Zhou, X. Wang, and E. Dougherty, “Nonlinear Probit Gene Classification Using Mutual Information and Wavelet-Based Feature Selection,” J. Biological Systems, vol. 12, no. 3, pp. 371-386, 2004.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader