Abstract
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.
- J.P. Brody, B.A. Williams, B.J. Wold, and S.R. Quake, "Significance and Statistical Errors in the Analysis of DNA Microarray Data," Proc. Nat'l Academy Sciences USA, vol. 99, no. 20, pp. 12975-12978, 2002.Google ScholarCross Ref
- M.J. Callow, S. Dudoit, E.L. Gong, T.P. Speed, and E.M. Rubin, "Microarray Expression Profiling Identifies Genes with Altered Expression in HDL Deficient Mice," Genome Research, vol. 10, pp. 2022-2029, 2000.Google ScholarCross Ref
- D. Dembele and P. Kastner, "Fuzzy C-Means Method for Clustering Microarray Data," Bioinformatics, vol. 19, pp. 973-980, 2003.Google ScholarCross Ref
- I. Dhilon, E. Marcotte, and U. Roshan, "Diametrical Clustering for Identifying Anticorrelated Gene Clusters," Bioinformatics, vol. 19, pp. 1612-1619, 2003.Google ScholarCross Ref
- S. Dudoit and J. Fridlyand, "Bagging to Improve the Accuracy of a Clustering Procedure," Biometrics, vol. 19, pp. 1090-1099, 2003.Google Scholar
- M. Dugas, S. Merk, S. Breit, and P. Dirschedl, "Mdclust: Exploratory Microarray Analysis by Multidimensional Clustering," Bioinformatics, vol. 20, pp. 931-936, 2004. Google ScholarDigital Library
- M.B. Eisen, P. Spellman, P.O. Brown, and D. Botstein, "Cluster Analysis and Display of Genome-Wide Expression Patterns," Proc. Nat'l Academy Sciences USA, vol. 95, pp. 14863-14868, 1998.Google ScholarCross Ref
- C. Fraley and A.E. Raftery, "MCLUST: Software for Model-Based Clustering Discriminant Analysis and Density Estimation," Technical Report 415, Dept. of Statistics, Univ. of Washington, 2002.Google ScholarCross Ref
- J.A. Hartigan and M.A. Wong, "A k-Means Clustering Algorithm," Applied Statistics, vol. 28, pp. 126-130, 1979.Google ScholarCross Ref
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2002.Google Scholar
- R. Herwig, A.J. Poustka, C. Meuller, H. Lehrach, and J. O'Brien, "Large-Scale Clustering of cDNAfingerprinting Data," Genome Research, vol. 9, no. 11, pp. 1093-1105, 1999.Google ScholarCross Ref
- D.V. Hinkley, "On the Ratio of Two Correlated Normal Random Variables," Biometrika, vol. 56, pp. 635-639, 1969.Google ScholarCross Ref
- D. Horn and I. Axel, "Novel Clustering Algorithm for Microarray Expression Data in a Truncated SVD Space," Bioinformatics, vol. 19, pp. 1110-1115, 2003.Google ScholarCross Ref
- T.R. Hughes, M.J. Marton, C.J. Jones, A.R. Roberts, R. Stoughton, C.D. Armour, H.A. Bennett, E. Coffey, and Y.D. He, "Functional Discovery via a Compendium of Expression Profiles," Cell, vol. 102, pp. 109-126, 2000.Google ScholarCross Ref
- T. Ideker, V. Thorsson, J.A. Ranish, R. Christmas, J. Buhler, J.K. Eng, R.E. Bumgarner, D.R. Goodlett, R. Aebersold, and L. Hood, "Integrated Genomic and Proteomic Analyses of a Systemically Perturbed Metabolic Network," Science, vol. 292, pp. 929-934, 2001.Google ScholarCross Ref
- N. Jardine and R. Sibson, Mathematical Taxonomy. Wiley, 1971.Google Scholar
- L. Kaufman and P.J. Rousseeuw, Finding Groups in a Data. Wiley, 1990.Google ScholarCross Ref
- T. Kohonen, "The Self-Organizing Map," Proc. IEEE, vol. 78, no. 9, pp. 1464-1479, Sept. 1990.Google ScholarCross Ref
- M.T. Lee, F.C. Kuo, G.A. Whitmore, and J. Sklar, "Importance of Replication in Microarray Gene Expression Studies: Statistical Methods and Evidence from Repetitive cDNA Hybridizations," Proc. Nat'l Academy Sciences USA, vol. 97, pp. 9834-9839, 2000.Google ScholarCross Ref
- A. Lukashin and R. Fuchs, "Analysis of Temporal Gene Expression Profiles: Clustering by Simulated Annealing and Determining the Optimal Number of Clusters," Bioinformatics, vol. 17, pp. 405-414, 2001.Google ScholarCross Ref
- F. Luo, L. Khan, F. Bastani, I.L. Yen, and J. Zhou, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 2004.Google Scholar
- G.J. McLachlan, R.W. Bean, and D. Peel, "A Mixture Model-Based Approach to the Clustering of Microarray Expression Data," Bioinformatics, vol. 18, pp. 1-10, 2002.Google ScholarCross Ref
- M. Medvedovic, K.Y. Yeung, and R.E. Bumgarner, "Bayesian Mixture Model Based Clustering of Replicated Microarray Data," Bioinformatics, vol. 8, pp. 1222-1232, 2004. Google ScholarDigital Library
- J. Qin, D. Lewis, and W. Noble, "Kernel Hierarchical Gene Clustering from Microarray Gene Expression Data," Bioinformatics, vol. 19, pp. 2097-2104, 2003.Google ScholarCross Ref
- D. Ridder, F. Staal, J.M. van Dogen, and M.J. Reinders, "Maximum Significance Clustering of Oligonucleotide Microarrays," Bioinformatics, vol. 22, pp. 326-331, 2006. Google ScholarDigital Library
- M. Salicrú and P. Sánchez, "Pseudocontinuity in Hierarchical Classifications," Information Sciences, vol. 120, pp. 257-265, 1999. Google ScholarDigital Library
- M. Schena, D. Shalon, R.W. Davis, and P.O. Brown, "Quantitative Monitoring of Gene Expression Patterns with Complementary DNA Microarray," Science, vol. 270, pp. 467-470, 1995.Google ScholarCross Ref
- R. Sharan, A. Maron-Katz, and R. Shamir, "CLICK and Expander: A System for Clustering and Visualizing Gene Expression Data," Bioinformatics, vol. 19, pp. 1787-1799, 2003.Google ScholarCross Ref
- R. Sharan and R. Shamir, "CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis," Proc. Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 307-316, 2000. Google ScholarDigital Library
- G. Sherlock, "Analysis of Large-Scale Gene Expression Data," Current Opinion in Immunology, vol. 12, pp. 201-205, 2000.Google ScholarCross Ref
- G.K. Smyth, "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, vol 3, no. 3, pp. 1-26, 2004.Google ScholarCross Ref
- R. Steuer, J. Kurths, C. Daub, J. Weise, and J. Selbig, "The Mutual Information: Detecting and Evaluating Dependencies between Variables," Bioinformatics, vol. 18, pp. 231-240, 2002.Google ScholarCross Ref
- Z. Szallasi and R. Somogyi, "Genetic Network Analysis-the Millennium Opening Version," Proc. Pacific Symp. BioComputing Tutorial, 2001.Google Scholar
- P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, "Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation," Proc. Nat'l Academy Sciences USA, vol. 96, pp. 2907-2912, 1999.Google ScholarCross Ref
- S. Tavazoide, J. Hughes, M. Campbell, R.J. Cho, and G.M. Churo, "Systematic Determination of Genetic Network Architecture," Nature Genetics, vol. 22, pp. 281-285, 1999.Google ScholarCross Ref
- S. Theodoridis and K. Koutroumbas, Pattern Recognition. Academic Press, 1999.Google ScholarDigital Library
- S. Varma and R. Simon, "Iterative Class Discovery and Feature Selection Using Minimal Spanning Trees," BMC Bioinformatics, vol. 5, pp. 126-134, 2004.Google ScholarCross Ref
- X. Wen, S. Fuhrman, G.S. Michaels, D.B. Carr, S. Smith, J.L. Barker, and R. Somogyi, "Large-Scale Temporal Gene Expression Mapping of Central Nervous System Development," Proc. Nat'l Academy Sciences USA, vol. 95, pp. 334-339, 1998.Google ScholarCross Ref
- K. Yeung, D. Haynor, and W. Ruzzo, "Validating Clustering for Gene Expression Data," Bioinformatics, vol. 17, pp. 309-318, 2001.Google ScholarCross Ref
- K.Y. Yeung, M. Medvedovic, and R.E. Bumgarner, "Clustering Gene Expression Data with Repeated Measurements," Genome Biology, vol 4, no. 5, p. 1-16, 2003.Google ScholarCross Ref
Index Terms
- Inferential Clustering Approach for Microarray Experiments with Replicated Measurements
Recommendations
A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis
A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas ...
Comments