|
ABSTRACT
Microarray technology has generated enormous amounts of high-dimensional gene expression data, providing a unique platform for exploring gene regulatory networks. However, the curse of dimensionality plagues effort to analyze these high throughput data. Linear Discriminant Analysis (LDA) and Biased Discriminant Analysis (BDA) are two popular techniques for dimension reduction, which pay attention to different roles of the positive and negative samples in finding discriminating subspace. However, the drawbacks of these two methods are obvious: LDA has limited efficiency in classifying sample data from subclasses with different distributions, and BDA does not account for the underlying distribution of negative samples. In this paper, we propose a novel dimension reduction technique for microarray analysis: Adaptive Discriminant Analysis (ADA), which effectively exploits favorable attributes of both BDA and LDA and avoids their unfavorable ones. ADA can find a good discriminative subspace with adaptation to different sample distributions. It not only alleviates the problem of high dimensionality, but also enhances the classification performance in the subspace with naïve Bayes classifier. To learn the best model fitting the real scenario, boosted Adaptive Discriminant Analysis is further proposed. Extensive experiments on the yeast cell cycle regulation data set, and the expression data of the red blood cell cycle in malaria parasite Plasmodium falciparum demonstrate the superior performance of ADA and boosted ADA. We also present some putative genes of specific functional classes predicted by boosted ADA. Their potential functionality is confirmed by independent predictions based on Gene Ontology, demonstrating that ADA and boosted ADA are effective dimension reduction methods for microarray-based classification.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bozdech, Z., Llinas, M., Pulliam, B. L., Wong, E. D., Zhu, J., and DeRisi, J. L. 2003. The transcriptome of the intraerythrocytic development cycle of plasmodium falciparum. Plos Biol. 1, 1, 1--16.
|
| |
3
|
Brown, M. P. S., Grundy, W. N., Lin D., and N. Cristianini, et al. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. In Proceedings of the National Academy of Science. 262--267.
|
| |
4
|
Chipman, H., Hastie, T., and Tibshirani, R. 2003. Clustering microarray data. In Statistical Analysis of Gene Expression Microarray Data, Chapman & Hall, Boca Raton, FL.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Dudoit, S., Fridlyand, J., and Speed, T. P. 2000. Comparison of discrimination methods for the classification of tumors using gene expression data. Tech. Rep. 576, Department of Statistics, University of Calofornia Berkeley.
|
| |
8
|
Eisen, M. B., Spellman, P. T., Brown, P. O., and Bostein, D. 1998. Cluster analysis and display of genome-wide expression patterns. In Proceedings of the National Academy of Science. 14863--14868.
|
| |
9
|
Etemad K. and Chellapa, R. 1997. Discriminant analysis for recognition of human face images. J. Optical Soci. Amer. 14, 8, 1724--1733.
|
| |
10
|
Ewans, W. J. and Grant, G. R. 2001. Statistical Methods in Bioinformatics, Springer-Verlag.
|
| |
11
|
Fisher, R. A. 1936. The use of multiple measurement in taxonomic problems. Annals of Eugenics 7, 179--188.
|
| |
12
|
Fisher, R. A. 1938. The statistical utilization of multiple measurements. Annals of Eugenics 8, 376--386.
|
| |
13
|
Freund, Y. and Schapire, R. E. 1999. A short introduction to boosting. J. Japan. Soc. AI 14, 5, 771--780.
|
| |
14
|
Gantt, S. M., Myung, J. M., Briones, M. R., Li, W. D., Corey, E. J., Omura, S., Nussenzweig, V., and Sinnis, P. 1998. Proteasome inhibitors block development of Plasmodium spp. Antimicrob Agents Chemother. 42, 2731--2738.
|
| |
15
|
Gardner, M. J., Hall, N., Fung, E., White, O., Berriman, M., and Hyman, R. W., et al. 2002. Genome sequence of the human malaria parasite. Plasmodium falciparum. Nature 419, 498--511.
|
| |
16
|
The Gene Ontology Consortium. 2000. Gene Ontology: Tool for the unification of biology, Nature Genet. 25, 25--29,
|
 |
17
|
|
| |
18
|
Jolliffe I. T. 2002. Principal Component Analysis, 2nd Ed., Springer-Verlag.
|
| |
19
|
Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M. et al. 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7, 6, 673--679.
|
| |
20
|
Li, L., Weinberg, C. R., Darden, T. A., and Pedersen, L. G. 2001. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinform. 17, 12, 1131--1142.
|
| |
21
|
Mateos, A., Dopazo, J., Jansen, R., et al. 2002. Systematic learning of gene functional classes from dna array expression data by using multiplayer perceptrons. Genomes. Resear. 12, 11, 1703--1715.
|
| |
22
|
Ng, S., Tan, S., and Sundararajan, V. S. 2003. On combing multiple microarray studies for improved functional classification by whole-dataset feature selection. Genome Inform. 14, 44--53.
|
| |
23
|
Ringner, M., Peterson, C., and Khan, J. 2002. Analyzing array data using supervised methods. Pharmacogenomics 3, 403--415.
|
| |
24
|
|
| |
25
|
Tamayo, P., Slonim, D., Mesirov, J., et al. 1999. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Science. 2907--2912.
|
| |
26
|
Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J., and Church, G. M. 1999. Systematic determination of genetic network architecture. Nat. Genet. 22, 281--285.
|
| |
27
|
|
| |
28
|
Wu, Y., Tian, Q., and Huang, T. S. 2000. Discriminant EM algorithm with application to image retrieval. In Proceedings of IEEE Conference Computer Vision and Pattern Recognition.
|
| |
29
|
Wu, Y., Wang, X., Liu, X., and Wang, Y. 2003. Data-mining approaches reveal hidden families of proteases in the genome of malaria parasite. Genome Resear. 13, 601--616.
|
| |
30
|
Zhou, X. and Huang, T. S. 2001. Small sample learning during multimedia retrieval using bias Map. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
|
|