article

Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering

Authors:

Inderjit S. DhillonAuthors Info & Claims

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 5, Issue 3

Pages 385 - 400

https://doi.org/10.1109/TCBB.2007.70268

Published: 01 July 2008 Publication History

Abstract

It is a consensus in microarray analysis that identifying potential local patterns, characterized by coherent groups of genes and conditions, may shed light on the discovery of previously undetectable biological cellular processes of genes as well as macroscopic phenotypes of related samples. In order to simultaneously cluster genes and conditions, we have previously developed a fast co-clustering algorithm, Minimum Sum-Squared Residue Co-clustering (MSSRCC), which employs an alternating minimization scheme and generates what we call co-clusters in a checkerboard structure. In this paper, we propose specific strategies that enable MSSRCC to escape poor local minima and resolve the degeneracy problem in partitional clustering algorithms. The strategies include binormalization, deterministic spectral initialization, and incremental local search. We assess the effects of various strategies on both synthetic gene expression datasets and real human cancer microarrays and provide empirical evidence that MSSRCC with the proposed strategies performs better than existing co-clustering and clustering algorithms. In particular, the combination of all the three strategies leads to the best performance. Furthermore, we illustrate coherence of the resulting co-clusters in a checkerboard structure, where genes in a co-cluster manifest the phenotype structure of corresponding specific samples, and evaluate the enrichment of functional annotations in Gene Ontology (GO).

References

[1]

J.L. DeRisi, V.R. Iyer, and P.O. Brown, "Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale," Science, vol. 278, no. 5338, pp. 680-686, 1997.

[2]

P.F. Macgregor and J.A. Squire, "Application of Microarrays to the Analysis of Gene Expression in Cancer," Clinical Chemistry, vol. 48, no. 8, pp. 1170-1177, 2002.

[3]

D.K. Slonim, "From Patterns to Pathways: Gene Expression Data Analysis Comes of Age," Nature Genetics Supplement, vol. 32, pp. 502-508, 2002.

[4]

M. Schena, Microarray Analysis. John Wiley & Sons, 2003.

[5]

M.F. Ochs and A.K. Godwin, "Microarrays in Cancer: Research and Applications," BioTechniques, vol. 34, pp. S4-S15, 2003.

[6]

D. Jiang, C. Tang, and A. Zhang, "Cluster Analysis for Gene Expression Data: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.

Digital Library

[7]

R. Shamir and R. Sharan, "Algorithmic Approaches to Clustering Gene Expression Data," Current Topics in Computational Molecular Biology, pp. 269-299, MIT Press, 2002.

[8]

M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, "Cluster Analysis and Display of Genome-Wide xpression Patterns," Proc. Nat'l Academy of Science, vol. 95, no. 25, pp. 14 863-14 868, 1998.

[9]

T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, pp. 531-537, 1999.

[10]

A.A. Alizadeh et al., "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.

[11]

A. Ben-Dor, B. Chor, R.M. Karp, and Z. Yakhini, "Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem," J. Computational Biology, vol. 10, nos. 3-4, pp. 373-384, 2003.

[12]

H. Cho, I.S. Dhillon, Y. Guan, and S. Sra, "Minimum Sum-Squared Residue Based Co-clustering of Gene Expression Data," Proc. Fourth SIAM Int'l Conf. Data Mining (SDM '04), pp. 114-125, 2004.

[13]

J.A. Hartigan, "Direct Clustering of a Data Matrix," J. Am. Statistical Assoc., vol. 67, no. 337, pp. 123-129, 1972.

[14]

Y. Cheng and G.M. Church, "Biclustering of Expression Data," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '00), vol. 8, pp. 93-103, 2000.

Digital Library

[15]

O.E. Livne and G.H. Golub, "Scaling by Binormalization," Numerical Algorithms, vol. 35, no. 1, pp. 97-120, 2004.

[16]

I.S. Dhillon, Y. Guan, and J. Kogan, "Iterative Clustering of High Dimensional Text Data Augmented by Local Search," Proc. Second IEEE Int'l Conf. Data Mining (ICDM), 2002.

Digital Library

[17]

T.G.O. Consortium, "Gene Ontology: Tool for the Unification of Biology," Nature Genetics, vol. 25, pp. 25-29, 2000.

[18]

G. Dennis Jr. et al., "DAVID: Database for Annotation, Visualization, and Integrated Discovery," Genome Biology, vol. 4, no. R60, 2003.

[19]

I.V. Mechelen, H. Bock, and P.D. Boeck, "Two-Mode Clustering Methods: A Structured Overview," Statistical Methods in Medical Research, vol. 13, pp. 363-394, 2004.

[20]

S.C. Madeira and A.L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.

Digital Library

[21]

B. Mirkin, Mathematical Classification and Clustering. Kluwer Academic Publishers, 1996.

[22]

I. Csiszár and G. Tusnády, "Information Geometry and Alternating Minimization Procedure," Statistics and Decisions, supplemental issue, vol. 1, pp. 205-237, 1984.

[23]

W. Gaul and M. Schader, "A New Algorithm for Two-Mode Clustering," Data Analysis and Information Systems, H. Hermann and W. Polasek, eds., pp. 15-23, Springer, 1996.

[24]

D. Baier, W. Gaul, and M. Schader, "Two-Mode Overlapping Clustering with Applications to Simultaneous Benefit Segmentation and Market Structuring," Classification and Knowledge Organization: Recent Advances and Applications, R. Klar and O. Opitz, eds., pp. 557-566, Springer, 1997.

[25]

V. Maurizio, "Double k-means Clustering for Simultaneous Classification of Objects and Variables," Advances in Classification and Data Analysis, S. Borra, R. Rocci, M. Vichi, and M. Schader, eds., pp. 43-52, Springer, 2001.

[26]

J. Yang, H. Wang, W. Wang, and P. Yu, "Enhanced Biclustering on Expression Data," Proc. Third IEEE Symp. Bioinformatics and BioEngineering (BIBE '03), pp. 321-327, 2003.

Digital Library

[27]

J. Yang, W. Wang, H. Wang, and P. Yu, "¿-Clusters: Capturing Subspace Correlation in a Large Data Set," Proc. 18th IEEE Int'l Conf. Data Eng. (ICDE '02), pp. 517-528, 2002.

Digital Library

[28]

Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein, "Spectral Biclustering of Microarray Data: Coclustering of Genes and Conditions," Genome Research, vol. 13, no. 4, pp. 703-716, 2003.

[29]

I.S. Dhillon, "Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning," Proc. Seventh ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '01), pp. 269- 274, 2001.

Digital Library

[30]

I.S. Dhillon, S. Mallela, and D.S. Modha, "Information-Theoretic Co-Clustering," Proc. Ninth ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD '03), pp. 89-98, 2003.

Digital Library

[31]

A. Banerjee, I.S. Dhillon, J. Ghosh, S. Merugu, and D.S. Modha, "A Generalized Maximum Entropy Approach to Bregman Co-Clustering and Matrix Approximation," J. Machine Learning Research, vol. 8, pp. 1919-1986, 2007.

Digital Library

[32]

S. Bleuler, A. Prelic, and E. Zitzler, "An EA Framework for Biclustering of Gene Expression Data," Proc. Sixth Congress on Evolutionary Computation (CEC '04), pp. 166-173, 2004.

[33]

T.H. Bø and I. Jonassen, "New Feature Subset Selection Procedures for Classification of Expression Profiles," Genome Biology, vol. 3, no. 4, 2002.

[34]

M. Dettling and P. Bühlmann, "Supervised Clustering of Genes," Genome Biology, vol. 3, no. 12, 2002.

[35]

S. Dudoit and J. Fridlyand, "A Prediction-Based Resampling Method for Estimating the Number of Clusters in a Dataset," Genome Biology, vol. 3, no. 7, pp. 0036.1-0036.21, 2002.

[36]

F.C. Sánchez, P.J. Lewi, and D.L. Massart, "Effect of Different Preprocessing Methods for Principal Component Analysis Applied to the Composition of Mixtures: Detection of Impurities in HPLC-DAD," Chemometrics and Intelligent Laboratory Systems, vol. 25, no. 2, pp. 157-177, 1994.

[37]

L. Wouters et al., "Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods," Biometrics, vol. 59, pp. 1131-1139, 2003.

[38]

B.R. Kowalski and C.F. Bender, "Pattern Recognition: A Powerful Approach to Interpreting Chemical Data," J. Am. Chemical Soc., vol. 94, no. 16, pp. 5632-5639, 1972.

[39]

R.A. Harshman and M.E. Lundy, "Data Preprocessing and the Extended PARAFAC Model," Research Methods for Multimode Data Analysis, pp. 216-284, Praeger, 1984.

[40]

R. Bro and A.K. Smilde, "Centering and Scaling in Component Analysis," J. Chemometrics, vol. 17, pp. 16-33, 2003.

[41]

A. Smilde, R. Bro, and P. Geladi, "Preprocessing" Multi-Way Analysis with Applications in the Chemical Sciences, pp. 221-255, John Wiley & Sons, 2004.

[42]

D.S. Johnson, "The NP-Completeness Column: An Ongoing Guide," J. Algorithms, vol. 8, no. 3, pp. 438-448, 1987.

Digital Library

[43]

C. Eckart and G. Young, "The Approximation of One Matrix by Another of Lower Rank," Psychometrika, vol. 1, pp. 211-218, 1936.

[44]

S.X. Yu and J. Shi, "Multiclass Spectral Clustering," Proc. Ninth IEEE Int'l Conf. Computer Vision, 2003.

Digital Library

[45]

A. Prelic et al., "A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data," Bioinformatics, vol. 22, no. 9, pp. 1122-1129, 2006.

Digital Library

[46]

U. Alon et al., "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Science, vol. 96, no. 12, pp. 6745-6750, 1999.

[47]

G.J. Gordon et al., "Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma," Cancer Research, vol. 62, pp. 4963-4967, 2002.

[48]

S.A. Armstrong et al., "MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, pp. 41-47, 2002.

[49]

P.J. Lewi, "Spectral Map Analysis: Factorial Analysis of Contrast, Especially from Log Ratios," Chemometrics and Intelligent Laboratory Systems, vol. 5, no. 2, pp. 105-116, 1989.

[50]

M. Xiong, W. Li, J. Zhao, L. Jin, and E. Boerwinkle, "Feature (Gene) Selection in Gene Expression-Based Tumor Classification," Molecular Genetics and Metabolism, vol. 73, pp. 239-247, 2001.

[51]

E. Yeoh et al., "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, no. 2, pp. 133-143, 2002.

[52]

I.G. Costa, F.A. de Carvalho, and M.C. de Souto, "Comparative Analysis of Clustering Methods for Gene Expression Time Course Data," Genetics and Molecular Biology, vol. 27, no. 4, pp. 623-631, 2004.

[53]

S. Datta and S. Datta, "Comparisons and Validation of Statistical Clustering Techniques for Microarray Gene Expression Data," Bioinformatics, vol. 19, pp. 459-466, 2003.

[54]

F.D. Gibbons and F.P. Roth, "Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation," Genome Research, vol. 12, pp. 1574-1581, 2002.

[55]

J. Chen, X. He, and L. Li, "Identifying the Patterns of Hematopoietic Stem Cells Gene Expressions Using Clustering Methods: Comparison and Summary," J. Data Science, vol. 2, pp. 297-379, 2004.

[56]

K.Y. Yeung, D.R. Haynor, and W.L. Ruzzo, "Validating Clustering for Gene Expression Data," Bioinformatics, vol. 17, no. 4, pp. 309- 318, 2001.

Digital Library

[57]

J. Handl, J. Knowles, and D.B. Kell, "Computational Cluster Validation in Post-Genomic Data Analysis," Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005.

Digital Library

[58]

M.L. Chow, E.J. Moler, and I.S. Mian, "Identifying Marker Genes in Transcription Profiling Data Using a Mixture of Feature Relevance Experts," Physiological Genomics, vol. 5, pp. 99-111, 2001.

[59]

G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering Analysis of Gene Microarray Data," Proc. Nat'l Academy of Science, vol. 97, no. 22, pp. 12 079-12 084, 2000.

[60]

X. Qiu et al., "Human Epithelial Cancers Secrete Immunoglobulin G with Unidentified Specificity to Promote Growth and Survival of Tumor Cells," Cancer Research, vol. 63, pp. 6488-6495, 2003.

[61]

N. Tsai, B. Chen, S. Wei, C. Wu, and S.R. Roffler, "Anti-Tumor Immunoglobulin M Increases Lung Metastasis in an Experimental Model of Malignant Melanoma," Clinical and Experimental Metastasis , vol. 20, pp. 103-109, 2003.

[62]

T.J. Giordano et al., "Organ-Specific Molecular Classification of Primary Lung, Colon, and Ovarian Adenocarcinomas Using Gene Expression Profiles," Am. J. Pathology, vol. 159, no. 4, pp. 1231- 1238, 2001.

[63]

M. Nacht et al., "Molecular Characteristics of Non-Small Cell Lung Cancer," Proc. Nat'l Academy of Science, vol. 98, no. 26, pp. 15 203-15 208, 2001.

[64]

M.Z. Man, X. Wang, and Y. Wang, "POWER_SAGE: Comparing Statistical Tests for Sage Experiments," Bioinformatics, vol. 16, no. 11, pp. 953-959, 2000.

Cited By

Biernacki CJacques JKeribin C(2023)A Survey on Model-Based Co-Clustering: High Dimension and Estimation ChallengesJournal of Classification10.1007/s00357-023-09441-340:2(332-381)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1007/s00357-023-09441-3
Affeldt SLabiod LNadif M(2021)Regularized bi-directional co-clusteringStatistics and Computing10.1007/s11222-021-10006-w31:3Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1007/s11222-021-10006-w
EL bendadi KLakhdar YSbai E(2018)An Improved Kernel Credal Classification Algorithm Based on Regularized Mahalanobis DistanceComputational Intelligence and Neuroscience10.1155/2018/75257862018Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1155/2018/7525786
Show More Cited By

Index Terms

Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
      2. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Biclustering of human cancer microarray data using co-similarity based co-clustering

We propose a novel technique for finding biclusters in gene expression data.We propose a simple yet effective method for automatically determining discriminating biclusters.Our proposed method is robust to noise in the data.We evaluate the empirical and ...
Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps

We propose a novel co-clustering algorithm that is based on self-organizing maps (SOMs). The method is applied to group yeast (Saccharomyces cerevisiae) genes according to both expression profiles and Gene Ontology (GO) annotations. The combination of ...
Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways

We investigated cancer-related lncRNAs using GO and KEGG enrichment scores of the co-expressed neighbors of lncRNAs.The biological analysis confirmed the crucial cancer associated GO term and KEGG pathways we screened out.This study provided novel ...

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 5, Issue 3

July 2008

159 pages

ISSN:1545-5963

Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2008

Published in TCBB Volume 5, Issue 3

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
329
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Biernacki CJacques JKeribin C(2023)A Survey on Model-Based Co-Clustering: High Dimension and Estimation ChallengesJournal of Classification10.1007/s00357-023-09441-340:2(332-381)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1007/s00357-023-09441-3
Affeldt SLabiod LNadif M(2021)Regularized bi-directional co-clusteringStatistics and Computing10.1007/s11222-021-10006-w31:3Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1007/s11222-021-10006-w
EL bendadi KLakhdar YSbai E(2018)An Improved Kernel Credal Classification Algorithm Based on Regularized Mahalanobis DistanceComputational Intelligence and Neuroscience10.1155/2018/75257862018Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1155/2018/7525786
Govaert GNadif M(2018)Mutual information, phi-squared and model-based co-clustering for contingency tablesAdvances in Data Analysis and Classification10.1007/s11634-016-0274-612:3(455-488)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1007/s11634-016-0274-6
Ailem MRole FNadif M(2017)Model-based co-clustering for the effective handling of sparse dataPattern Recognition10.1016/j.patcog.2017.06.00572:C(108-122)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1016/j.patcog.2017.06.005
Laclau CNadif M(2016)Hard and fuzzy diagonal co-clustering for document-term partitioningNeurocomputing10.1016/j.neucom.2016.02.003193:C(133-147)Online publication date: 12-Jun-2016
https://dl.acm.org/doi/10.1016/j.neucom.2016.02.003
Ailem MRole FNadif M(2016)Graph modularity maximization as an effective method for co-clustering text dataKnowledge-Based Systems10.1016/j.knosys.2016.07.002109:C(160-173)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.knosys.2016.07.002
Devi RKannan SHong TRamathilagam S(2015)Fuzzy Clustering Systems in Analyzing High Dimensional DatabaseProceedings of the ASE BigData & SocialInformatics 201510.1145/2818869.2818879(1-4)Online publication date: 7-Oct-2015
https://dl.acm.org/doi/10.1145/2818869.2818879
Yu ZChen HYou JLiu JWong HHan GLi L(2015)Adaptive fuzzy consensus clustering framework for clustering analysis of cancer dataIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.235943312:4(887-901)Online publication date: 1-Jul-2015
https://dl.acm.org/doi/10.1109/TCBB.2014.2359433
Del Buono NPio G(2015)Non-negative Matrix Tri-Factorization for co-clusteringInformation Sciences: an International Journal10.1016/j.ins.2014.12.058301:C(13-26)Online publication date: 20-Apr-2015
https://dl.acm.org/doi/10.1016/j.ins.2014.12.058
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents