Abstract
Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.
- T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.Google ScholarCross Ref
- U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.Google ScholarCross Ref
- A.A. Alizadeh et al., "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, pp. 503-511, Feb. 2000.Google ScholarCross Ref
- K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo, "Model-Based Clustering and Data Transformations for Gene Expression Data," Bioinformatics, vol. 17, no. 10, pp. 977- 987, 2001.Google ScholarCross Ref
- C.-H. Zheng, D.-S. Huang, L. Zhang, and X.-Z. Kong, "Tumor Clustering Using Nonnegative Matrix Factorization with Gene Selection," IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 4, pp. 599-607, July 2009. Google ScholarDigital Library
- C.-H. Zheng, L. Zhang, V.T. Ng, C.K. Shiu, and D.-S. Huang, "Molecular Pattern Discovery Based on Penalized Matrix Decomposition," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1592-1603, Nov./Dec. 2011. Google ScholarDigital Library
- Z. Yu, J. You, L. Li, H.-S. Wong, and G. Han, "Representative Distance: A New Similarity Measure for Cancer Discovery from Gene Expression Data," IEEE Trans. NanoBioScience, vol. 11, no. 4, pp. 341-351, Dec. 2012.Google ScholarCross Ref
- S.A. Salem, L.B. Jack, and A.K. Nandi, "Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data," IEEE Trans. NanoBioscience, vol. 7, no. 1, pp. 65-79, Mar. 2008.Google ScholarCross Ref
- K.-S. Leung et al., "Data Mining on DNA Sequences of Hepatitis B Virus," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 428-440, Mar./Apr. 2011. Google ScholarDigital Library
- A. Strehl and J. Ghosh, "Cluster Ensembles--A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002. Google ScholarDigital Library
- A.L.N. Fred and A.K Jain, "Combining Multiple Clusterings Using Evidence Accumulation," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005. Google ScholarDigital Library
- A.P. Topchy, A.K. Jain, and W.F. Punch, "Cluster Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005. Google ScholarDigital Library
- H.G. Ayad and M.S. Kamel, "Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 16-173, Jan. 2008. Google ScholarDigital Library
- H.G. Ayad and M.S. Kamel, "On Voting-Based Consensus of Cluster Ensembles," Pattern Recognition, vol. 43, no. 5, pp. 1943- 1953, 2010. Google ScholarDigital Library
- C. Domeniconi and M. Al-Razgan, "Weighted Cluster Ensembles: Methods and Analysis," ACM Trans. Knowledge Discovery from Data, vol. 2, no. 4, pp. 1-42, 2009. Google ScholarDigital Library
- Y. Yang and K. Chen, "Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 2, pp. 307-320, Feb. 2011. Google ScholarDigital Library
- S. Dudoit and J. Fridlyand, "A Prediction-Based Resampling Method to Estimate the Number of Clusters in a Data Set," Genome Biology, vol. 3, no. 7, pp. 0036.1-0036.21, 2002.Google Scholar
- S. Dudoit and J. Fridlyand, "Bagging to Improve the Accuracy of a Clustering Procedure," Bioinformatics, vol. 19, no. 9, pp. 1090-1099, 2003.Google ScholarCross Ref
- S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data," Machine Learning, vol. 52, pp. 91-118, 2003. Google ScholarDigital Library
- C. Smyth and D. Coomans, "Clustering Microarrays with Predictive Weighted Ensembles," Proc. IEEE Symp. Computational Intelligence and Bioinformatics and Computational Biology (CIBCB '07), pp. 98-105, 2007.Google Scholar
- P. Mahata, "Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 138-152, Jan.-Mar. 2010. Google ScholarDigital Library
- N. Iam-on, T. Boongoen, and S. Garrett, "LCE: A Link-Based Cluster Ensemble Method for Improved Gene Expression Data Analysis," Bioinformatics, vol. 26, no. 12, pp. 1513-1519, 2010. Google ScholarDigital Library
- N. Iam-On et al., "Link-Based Cluster Ensembles for Heterogeneous Biological Data Analysis," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine (BIBM), pp. 573-578, 2010.Google Scholar
- S. Mimaroglu and E. Aksehirli, "DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 408-420, Mar./Apr. 2011. Google ScholarDigital Library
- M. Smolkin and D. Ghosh, "Cluster Stability Scores for Microarray Data in Cancer Studies," BMC Bioinformatics, vol. 4, article 36, 2003.Google ScholarCross Ref
- G. Valentini, "Clusterv: A Tool for Assessing the Reliability of Clusters Discovered in DNA Microarray Data," Bioinformatics, vol. 22, no. 3, pp. 369-370, 2006. Google ScholarDigital Library
- A. Bertoni and G. Valentini, "Randomized Maps for Assessing the Reliability of Patients Clusters in DNA Microarray Data Analyses," Artificial Intelligence in Medicine, vol. 37, no. 2, pp. 85-109, 2006. Google ScholarDigital Library
- A. Bertoni and G. Valentini, "Model Order Selection for Biomolecular Data Clustering," BMC Bioinformatics, vol. 8, no. Suppl 2, article S7, 2007.Google ScholarCross Ref
- Z. Yu, H.-S. Wong, and H. Wang, "Graph-Based Consensus Clustering for Class Discovery from Gene Expression Data," Bioinformatics, vol. 23, no. 21, pp. 2888-2896, 2007. Google ScholarDigital Library
- Z. Yu and H.-S. Wong, "Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data," IEEE Trans. NanoBioScience, vol. 10, no. 2, pp. 76-85, June 2011.Google ScholarCross Ref
- R. Avogadri and G. Valentini, "Fuzzy Ensemble Clustering Based on Random Projections for DNA Microarray Data Analysis," Artificial Intelligence in Medicine, vol. 45, nos. 2/3, pp. 173-183, 2009. Google ScholarDigital Library
- Z. Yu, H.-S. Wong, J. You, Q. Yang, and H. Liao, "Class Discovery from Gene Expression Data Based on Perturbation and Cluster Ensemble" IEEE Trans. NanoBioScience, vol. 8, no. 2, pp. 147-160, June 2009.Google ScholarCross Ref
- Z. Yu, L. Li, J. You, and G. Han, "SC3: Triple Spectral Clustering Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1751-1765, Nov./Dec. 2012. Google ScholarDigital Library
- Z. Yu, H.-S. Wong, J. You, G. Yu, and G. Han, "Hybrid Cluster Ensemble Framework Based on the Random Combination of Data Transformation Operators," Pattern Recognition, vol. 45, no. 5, pp. 1826-1837, 2012. Google ScholarDigital Library
- H. Chen, Z. Yu, G. Han, J. You, and L. Li, "NG2CE: Double Neural Gas Based Cluster Ensemble Framework," Proc. Seventh Int'l Conf. Computer Science and Education (ICCSE '12), pp. 26-31, 2012.Google Scholar
- Z. Yu, H. Chen, J. You, L. Li, and G. Han, "SOM2CE: Double Self-Organizing Map Based Cluster Ensemble Framework and Its Application in Cancer Gene Expression Profiles," Proc. 25th Int'l Conf. Industrial, Eng. and Other Applications of Applied Intelligent Systems (IEA/AIE '12), pp. 351-360, 2012. Google ScholarDigital Library
- E.J. Yeoh et al., "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, pp. 133-143, Mar. 2002.Google ScholarCross Ref
- A. Bhattacharjee et al., "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinomas Sub-Classes," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 24, pp. 13790-13795, 2001.Google ScholarCross Ref
- T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.Google ScholarCross Ref
- S. Ramaswamy et al., "Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, 2001.Google ScholarCross Ref
- A.I. Su et al., "Large-Scale Analysis of the Human and Mouse Transcriptomes," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 7, pp. 4465-4470, 2002.Google ScholarCross Ref
- L.I. Kuncheva and D. Vetrov, "Evaluation of Stability of K-Means Cluster Ensembles with Respect to Random Initialization," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 28, no. 11, pp. 1798-1808, Nov. 2006. Google ScholarDigital Library
Index Terms
- Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data
Recommendations
Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data
Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research ...
Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles
Tumor clustering is one of the important techniques for tumor discovery from cancer gene expression profiles, which is useful for the diagnosis and treatment of cancer. While different algorithms have been proposed for tumor clustering, few make use of ...
Cluster ensemble framework based on the group method of data handling
Graphical abstractCE-GMDH contains the following three components: (1) initial solutions, (2) a transfer function (mechanism for the mutation of this organisation), and (3) an external criterion (selection mechanism). Three CE-GMDH models were proposed ...
Comments