article

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data

Authors:
Zhiwen Yu

South China University of Technology, Guangzhou and Hong Kong Polytechnic University, Hong Kong

South China University of Technology, Guangzhou and Hong Kong Polytechnic University, Hong Kong
View Profile

,
Hantao Chen

South China University of Technology, Guangzhou

South China University of Technology, Guangzhou
View Profile

,
Jane You

Hong Kong Polytechnic University, Hong Kong

Hong Kong Polytechnic University, Hong Kong
View Profile

,
Guoqiang Han

South China University of Technology, Guangzhou

South China University of Technology, Guangzhou
View Profile

,
Le Li

South China University of Technology, Guangzhou

South China University of Technology, Guangzhou
View Profile

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 10 Issue 3pp 657–670https://doi.org/10.1109/TCBB.2013.59

Published:01 May 2013Publication History

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.

References

T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.Google ScholarCross Ref
U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.Google ScholarCross Ref
A.A. Alizadeh et al., "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, pp. 503-511, Feb. 2000.Google ScholarCross Ref
K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo, "Model-Based Clustering and Data Transformations for Gene Expression Data," Bioinformatics, vol. 17, no. 10, pp. 977- 987, 2001.Google ScholarCross Ref
C.-H. Zheng, D.-S. Huang, L. Zhang, and X.-Z. Kong, "Tumor Clustering Using Nonnegative Matrix Factorization with Gene Selection," IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 4, pp. 599-607, July 2009. Google ScholarDigital Library
C.-H. Zheng, L. Zhang, V.T. Ng, C.K. Shiu, and D.-S. Huang, "Molecular Pattern Discovery Based on Penalized Matrix Decomposition," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1592-1603, Nov./Dec. 2011. Google ScholarDigital Library
Z. Yu, J. You, L. Li, H.-S. Wong, and G. Han, "Representative Distance: A New Similarity Measure for Cancer Discovery from Gene Expression Data," IEEE Trans. NanoBioScience, vol. 11, no. 4, pp. 341-351, Dec. 2012.Google ScholarCross Ref
S.A. Salem, L.B. Jack, and A.K. Nandi, "Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data," IEEE Trans. NanoBioscience, vol. 7, no. 1, pp. 65-79, Mar. 2008.Google ScholarCross Ref
K.-S. Leung et al., "Data Mining on DNA Sequences of Hepatitis B Virus," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 428-440, Mar./Apr. 2011. Google ScholarDigital Library
A. Strehl and J. Ghosh, "Cluster Ensembles--A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002. Google ScholarDigital Library
A.L.N. Fred and A.K Jain, "Combining Multiple Clusterings Using Evidence Accumulation," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005. Google ScholarDigital Library
A.P. Topchy, A.K. Jain, and W.F. Punch, "Cluster Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005. Google ScholarDigital Library
H.G. Ayad and M.S. Kamel, "Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 16-173, Jan. 2008. Google ScholarDigital Library
H.G. Ayad and M.S. Kamel, "On Voting-Based Consensus of Cluster Ensembles," Pattern Recognition, vol. 43, no. 5, pp. 1943- 1953, 2010. Google ScholarDigital Library
C. Domeniconi and M. Al-Razgan, "Weighted Cluster Ensembles: Methods and Analysis," ACM Trans. Knowledge Discovery from Data, vol. 2, no. 4, pp. 1-42, 2009. Google ScholarDigital Library
Y. Yang and K. Chen, "Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 2, pp. 307-320, Feb. 2011. Google ScholarDigital Library
S. Dudoit and J. Fridlyand, "A Prediction-Based Resampling Method to Estimate the Number of Clusters in a Data Set," Genome Biology, vol. 3, no. 7, pp. 0036.1-0036.21, 2002.Google Scholar
S. Dudoit and J. Fridlyand, "Bagging to Improve the Accuracy of a Clustering Procedure," Bioinformatics, vol. 19, no. 9, pp. 1090-1099, 2003.Google ScholarCross Ref
S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data," Machine Learning, vol. 52, pp. 91-118, 2003. Google ScholarDigital Library
C. Smyth and D. Coomans, "Clustering Microarrays with Predictive Weighted Ensembles," Proc. IEEE Symp. Computational Intelligence and Bioinformatics and Computational Biology (CIBCB '07), pp. 98-105, 2007.Google Scholar
P. Mahata, "Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 138-152, Jan.-Mar. 2010. Google ScholarDigital Library
N. Iam-on, T. Boongoen, and S. Garrett, "LCE: A Link-Based Cluster Ensemble Method for Improved Gene Expression Data Analysis," Bioinformatics, vol. 26, no. 12, pp. 1513-1519, 2010. Google ScholarDigital Library
N. Iam-On et al., "Link-Based Cluster Ensembles for Heterogeneous Biological Data Analysis," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine (BIBM), pp. 573-578, 2010.Google Scholar
S. Mimaroglu and E. Aksehirli, "DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 408-420, Mar./Apr. 2011. Google ScholarDigital Library
M. Smolkin and D. Ghosh, "Cluster Stability Scores for Microarray Data in Cancer Studies," BMC Bioinformatics, vol. 4, article 36, 2003.Google ScholarCross Ref
G. Valentini, "Clusterv: A Tool for Assessing the Reliability of Clusters Discovered in DNA Microarray Data," Bioinformatics, vol. 22, no. 3, pp. 369-370, 2006. Google ScholarDigital Library
A. Bertoni and G. Valentini, "Randomized Maps for Assessing the Reliability of Patients Clusters in DNA Microarray Data Analyses," Artificial Intelligence in Medicine, vol. 37, no. 2, pp. 85-109, 2006. Google ScholarDigital Library
A. Bertoni and G. Valentini, "Model Order Selection for Biomolecular Data Clustering," BMC Bioinformatics, vol. 8, no. Suppl 2, article S7, 2007.Google ScholarCross Ref
Z. Yu, H.-S. Wong, and H. Wang, "Graph-Based Consensus Clustering for Class Discovery from Gene Expression Data," Bioinformatics, vol. 23, no. 21, pp. 2888-2896, 2007. Google ScholarDigital Library
Z. Yu and H.-S. Wong, "Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data," IEEE Trans. NanoBioScience, vol. 10, no. 2, pp. 76-85, June 2011.Google ScholarCross Ref
R. Avogadri and G. Valentini, "Fuzzy Ensemble Clustering Based on Random Projections for DNA Microarray Data Analysis," Artificial Intelligence in Medicine, vol. 45, nos. 2/3, pp. 173-183, 2009. Google ScholarDigital Library
Z. Yu, H.-S. Wong, J. You, Q. Yang, and H. Liao, "Class Discovery from Gene Expression Data Based on Perturbation and Cluster Ensemble" IEEE Trans. NanoBioScience, vol. 8, no. 2, pp. 147-160, June 2009.Google ScholarCross Ref
Z. Yu, L. Li, J. You, and G. Han, "SC3: Triple Spectral Clustering Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1751-1765, Nov./Dec. 2012. Google ScholarDigital Library
Z. Yu, H.-S. Wong, J. You, G. Yu, and G. Han, "Hybrid Cluster Ensemble Framework Based on the Random Combination of Data Transformation Operators," Pattern Recognition, vol. 45, no. 5, pp. 1826-1837, 2012. Google ScholarDigital Library
H. Chen, Z. Yu, G. Han, J. You, and L. Li, "NG2CE: Double Neural Gas Based Cluster Ensemble Framework," Proc. Seventh Int'l Conf. Computer Science and Education (ICCSE '12), pp. 26-31, 2012.Google Scholar
Z. Yu, H. Chen, J. You, L. Li, and G. Han, "SOM2CE: Double Self-Organizing Map Based Cluster Ensemble Framework and Its Application in Cancer Gene Expression Profiles," Proc. 25th Int'l Conf. Industrial, Eng. and Other Applications of Applied Intelligent Systems (IEA/AIE '12), pp. 351-360, 2012. Google ScholarDigital Library
E.J. Yeoh et al., "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, pp. 133-143, Mar. 2002.Google ScholarCross Ref
A. Bhattacharjee et al., "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinomas Sub-Classes," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 24, pp. 13790-13795, 2001.Google ScholarCross Ref
T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.Google ScholarCross Ref
S. Ramaswamy et al., "Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, 2001.Google ScholarCross Ref
A.I. Su et al., "Large-Scale Analysis of the Human and Mouse Transcriptomes," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 7, pp. 4465-4470, 2002.Google ScholarCross Ref
L.I. Kuncheva and D. Vetrov, "Evaluation of Stability of K-Means Cluster Ensembles with Respect to Random Initialization," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 28, no. 11, pp. 1798-1808, Nov. 2006. Google ScholarDigital Library

Index Terms

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Index terms have been assigned to the content through auto-classification.

Recommendations

Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data

Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research ...
Read More
Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles

Tumor clustering is one of the important techniques for tumor discovery from cancer gene expression profiles, which is useful for the diagnosis and treatment of cancer. While different algorithms have been proposed for tumor clustering, few make use of ...
Read More
Cluster ensemble framework based on the group method of data handling

Graphical abstractCE-GMDH contains the following three components: (1) initial solutions, (2) a transfer function (mechanism for the mutation of this organisation), and (3) an external criterion (selection mechanism). Three CE-GMDH models were proposed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 10, Issue 3
May 2013
272 pages
ISSN:1545-5963
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
IEEE Computer Society Press
Washington, DC, United States
Publication History
- Published: 1 May 2013
Published in tcbb Volume 10, Issue 3
Author Tags
Cluster ensemble
cancer discovery
gene expression profiles
tumor clustering
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 61
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data

Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles

Cluster ensemble framework based on the group method of data handling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data

Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles

Cluster ensemble framework based on the group method of data handling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media