article

Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 4 Issue 3pp 467–475https://doi.org/10.1109/tcbb.2007.1021

Published:01 July 2007Publication History

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic test. This paper focuses on effective identification of informative genes. We analyze gene selection models from the perspective of optimization theory. As a result, a new strategy is designed to modify conventional search engines. Also, as overfitting is likely to occur in microarray data because of their small sample set, a point injection technique is developed to address the problem of overfitting. The proposed strategies have been evaluated on three kinds of cancer diagnosis. Our results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases.

References

A. Al-Ani and M. Deriche, “Optimal Feature Selection Using Information Maximisation: Case of Biomedical Data,” Proc. 2000 IEEE Signal Processing Soc. Workshop, vol. 2, pp. 841-850, 2000.Google ScholarCross Ref
C.M. Bishop, Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995. Google ScholarDigital Library
Y. Chen, E.R. Dougherty, and M. Bittner, “Ratio-Based Decision and Quantitative Analysis of cDNA Microarrays,” J. Biomedical Optics, vol. 2, pp. 364-374, 1997.Google ScholarCross Ref
I. Cheng et al., “Common Genetic Variation in IGF1 and Prostate Cancer Risk in the Multiethnic Cohort,” J. Nat'l Cancer Inst., vol. 98, no. 2, pp. 123-124, 2006.Google ScholarCross Ref
M.L. Chow, E.J. Moler, and I.S. Mian, “Identifying Marker Genes in Transcription Profiling Data Using a Mixture of Feature Relevance Experts,” Physiological Genomics, vol. 5, pp. 99-111, 2001.Google ScholarCross Ref
S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of Discrimination Methods for the Classification of Tumours Using Gene Express Data,” J. Am. Statistical Assoc., vol. 97, no. 457, pp.77-87, 2002.Google ScholarCross Ref
R. Ekins and F.W. Chu, “Microarrays: Their Origins and Applications,” Trends in Biotechnology, vol. 17, pp. 217-218, 1999.Google ScholarCross Ref
T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, pp. 531-537, 1999.Google Scholar
J.R. Graff et al., “Integrin-linked Kinase Expression Increases with Prostate Tumor Grade,” Clinical Cancer Research, vol. 7, pp. 1987-1991, 2002.Google Scholar
I. Guyon, J. Weston, and S. Barnhill, “Gene Selection for Cancer Classification Using Support Vector Machines,” Machine Learning, vol. 46, pp. 389-422, 2002. Google ScholarDigital Library
J. Gui and H. Li, “Penalized Cox Regression Analysis in the High-Dimensional and Low-Sample Size Settings, with Application to Microarray Gene Expression Data,” Bioinformatics, vol. 21, no. 13, pp. 3001-3008, 2005. Google ScholarDigital Library
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, pp. 308-312. Springer, 2001.Google Scholar
D. Huang and T.W.S. Chow, “Efficiently Searching the Important Input Variables Using Bayesian Discriminant,” IEEE Trans. Circuits and Systems, vol. 52, no. 4, pp. 785-793, 2005.Google ScholarCross Ref
D. Huang, T.W.S. Chow, E.W.M. Ma, and J. Li, “Efficient Selection of Salient Features from Microarray Gene Expression Data for Cancer Diagnosis,” IEEE Trans. Circuits and Systems, Part I, vol. 52, no. 9, pp. 1909-1918, 2005Google ScholarCross Ref
S. Kim et al., “Strong Feature Sets From Small Samples,” J.Computational Biology, vol. 9, pp. 127-146, 2002.Google ScholarCross Ref
F. Lampariello and M. Sciandrone, “Efficient Training of RBF Neural Networks for Pattern Recognition,” IEEE Trans. Neural Networks, vol. 12, no. 5, pp. 1235-1242, 2001. Google ScholarDigital Library
K.E. Lee et al., “Gene Selection: A Bayesian Variable Selection Approach,” Bioinformatics, vol. 19, no. 1, pp. 90-97, 2003.Google ScholarCross Ref
W. Li and Y. Yang, “How Many Genes Are Needed for a Discriminant Microarray Data Analysis?” Methods of Microarray Data Analysis, S.M Lin and K.F. Johnson, eds., pp. 137-150, Kluwer Academic, 2002.Google ScholarCross Ref
H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic, 1998. Google ScholarDigital Library
X. Liu, A. Krishnan, and A. Mondry, “An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data,” BMC Bioinformatics, vol. 6, no. 76, 2005.Google Scholar
C. Jerónimo et al., “Aberrant Cellular Retinol Binding Protein 1 (CRBP1) Gene Expression and Promoter Methylation in Prostate Cancer,” J. Clinical Pathology, vol. 57, pp. 872-876, 2004.Google ScholarCross Ref
L.C. Molina, L. Belanche, and A. Nebot, “Feature Selection Algorithms: A Survey and Experimental Evaluation,” technical report, http://www.lsi.upc.es/dept/techreps/html/R02-62.html, 2002.Google Scholar
E. Parzen, “On the Estimation of a Probability Density Function and Mode,” Annals Math. Statistics, vol. 33, pp. 1064-1076, 1962.Google ScholarCross Ref
P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection,” Pattern Recognition Letters, vol. 15, pp. 1119-1125, 1994. Google ScholarDigital Library
S.C. Shah and A. Kusiak, “Data Mining and Genetic Algorithm Based Gene/SNP Selection,” Intelligence in Medicine, vol. 31, no. 3, pp. 183-196, 2004. Google ScholarDigital Library
D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.Google ScholarDigital Library
C. Sima, U. Braga-Neto, and E.R. Dougherty, “Superior Feature-Set Ranking for Small Samples Using Bolstered Error Estimation,” Bioinformatics, vol. 21, no. 7, pp. 1046-1054, 2005. Google ScholarDigital Library
M. Skurichina, S. Raudys, and R.P. Duin, “K-Nearest Neighbours Directed Noise Injection in Multilayer Perceptron Training,” IEEE Trans. Neural Networks, vol. 11, no. 2, pp. 504-511, 2000. Google ScholarDigital Library
Y. Su, T.M. Pavlovic, and S. Kasif, “RankGene: Identification of Diagnostic Genes Based on Expression Data,” Bioinformatics, vol. 19, no. 12, pp. 1578-1579, 2003.Google ScholarCross Ref
T.J. Umpai and S. Aitken, “Feature Selection and Classification for Microarray Data Analysis: Evolutionary Methods for Identifying Predictive Genes,” BMC Bioinformatics, vol. 6, no. 148, 2005.Google Scholar
S.S. Uzma and H.G. Robert, “Fingerprinting the Diseased Prostate: Associations between BPH and Prostate Cancer,” J. Cellular Biochemistry, vol. 91, pp. 161-169, 2004.Google ScholarCross Ref
E.P. Xing, M.I. Jordan, and M. Karp, “Feature Selection for High-Dimensional Genomic Microarray Data,” Proc. 18th Int'l Conf. Machine Learning, 2001. Google ScholarDigital Library
K. Yeung, R.E. Bumgarner, and A.E. Raftery, “Bayesian Model Averaging: Development of an Improved Multi-Class, Gene Selection and Classification Tool for Microarray Data,” Bioinformatics, vol. 21, no. 10, pp. 2394-2402, 2005. Google ScholarDigital Library
C. Zhang et al., “Profiling Alternatively Spliced mRNA Isoforms for Prostate Cancer Classification,” BMC Bioinformatics, vol. 7, pp.202-236, 2006.Google ScholarCross Ref
X. Zhou, X. Wang, and E. Dougherty, “Nonlinear Probit Gene Classification Using Mutual Information and Wavelet-Based Feature Selection,” J. Biological Systems, vol. 12, no. 3, pp. 371-386, 2004.Google ScholarCross Ref

Index Terms

Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

Recommendations

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene ...
Read More
A gene selection method for microarray data based on sampling
ICCCI'10: Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II

Microarray technology has become an important tool for biologists in recent years. It can obtain the expressions of a large amount of genes in a single experiment. One of the research issues of microarray is to select a set of relevant genes from a ...
Read More
Multiple gene sets for cancer classification using gene range selection based on random forest
ACIIDS'13: Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I

The advancement of microarray technology allows obtaining genetic information from cancer patients, as computational data and cancer classification through computation software, has become possible. Through gene selection, we can identify certain ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 4, Issue 3
July 2007
192 pages
ISSN:1545-5963
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
IEEE Computer Society Press
Washington, DC, United States
Publication History
- Published: 1 July 2007
Published in tcbb Volume 4, Issue 3
Author Tags
gene selection
gradient based learning
optimization theory
point injection
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 206
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

References

Cited By

Index Terms

Recommendations

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

A gene selection method for microarray data based on sampling

Multiple gene sets for cancer classification using gene range selection based on random forest

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

References

Cited By

Index Terms

Recommendations

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

A gene selection method for microarray data based on sampling

Multiple gene sets for cancer classification using gene range selection based on random forest

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media