skip to main content
article

Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data

Authors Info & Claims
Published:01 May 2013Publication History
Skip Abstract Section

Abstract

Cancer class discovery using biomolecular data is one of the most important tasks for cancer diagnosis and treatment. Tumor clustering from gene expression data provides a new way to perform cancer class discovery. Most of the existing research works adopt single-clustering algorithms to perform tumor clustering is from biomolecular data that lack robustness, stability, and accuracy. To further improve the performance of tumor clustering from biomolecular data, we introduce the fuzzy theory into the cluster ensemble framework for tumor clustering from biomolecular data, and propose four kinds of hybrid fuzzy cluster ensemble frameworks (HFCEF), named as HFCEF-I, HFCEF-II, HFCEF-III, and HFCEF-IV, respectively, to identify samples that belong to different types of cancers. The difference between HFCEF-I and HFCEF-II is that they adopt different ensemble generator approaches to generate a set of fuzzy matrices in the ensemble. Specifically, HFCEF-I applies the affinity propagation algorithm (AP) to perform clustering on the sample dimension and generates a set of fuzzy matrices in the ensemble based on the fuzzy membership function and base samples selected by AP. HFCEF-II adopts AP to perform clustering on the attribute dimension, generates a set of subspaces, and obtains a set of fuzzy matrices in the ensemble by performing fuzzy c-means on subspaces. Compared with HFCEF-I and HFCEF-II, HFCEF-III and HFCEF-IV consider the characteristics of HFCEF-I and HFCEF-II. HFCEF-III combines HFCEF-I and HFCEF-II in a serial way, while HFCEF-IV integrates HFCEF-I and HFCEF-II in a concurrent way. HFCEFs adopt suitable consensus functions, such as the fuzzy c-means algorithm or the normalized cut algorithm (Ncut), to summarize generated fuzzy matrices, and obtain the final results. The experiments on real data sets from UCI machine learning repository and cancer gene expression profiles illustrate that 1) the proposed hybrid fuzzy cluster ensemble frameworks work well on real data sets, especially biomolecular data, and 2) the proposed approaches are able to provide more robust, stable, and accurate results when compared with the state-of-the-art single clustering algorithms and traditional cluster ensemble approaches.

References

  1. T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  2. U. Alon, N. Barkai, D. Notterman, K. Gish, S. Ybarra, D. Mack, and A. Levine, "Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays," Proc. Nat'l Academy of Sciences USA, vol. 96, no. 12, pp. 6745-6750, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. A.A. Alizadeh et al., "Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling," Nature, vol. 403, pp. 503-511, Feb. 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. K.Y. Yeung, C. Fraley, A. Murua, A.E. Raftery, and W.L. Ruzzo, "Model-Based Clustering and Data Transformations for Gene Expression Data," Bioinformatics, vol. 17, no. 10, pp. 977- 987, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  5. C.-H. Zheng, D.-S. Huang, L. Zhang, and X.-Z. Kong, "Tumor Clustering Using Nonnegative Matrix Factorization with Gene Selection," IEEE Trans. Information Technology in Biomedicine, vol. 13, no. 4, pp. 599-607, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-H. Zheng, L. Zhang, V.T. Ng, C.K. Shiu, and D.-S. Huang, "Molecular Pattern Discovery Based on Penalized Matrix Decomposition," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 6, pp. 1592-1603, Nov./Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Yu, J. You, L. Li, H.-S. Wong, and G. Han, "Representative Distance: A New Similarity Measure for Cancer Discovery from Gene Expression Data," IEEE Trans. NanoBioScience, vol. 11, no. 4, pp. 341-351, Dec. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. S.A. Salem, L.B. Jack, and A.K. Nandi, "Investigation of Self-Organizing Oscillator Networks for Use in Clustering Microarray Data," IEEE Trans. NanoBioscience, vol. 7, no. 1, pp. 65-79, Mar. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  9. K.-S. Leung et al., "Data Mining on DNA Sequences of Hepatitis B Virus," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 2, pp. 428-440, Mar./Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Strehl and J. Ghosh, "Cluster Ensembles--A Knowledge Reuse Framework for Combining Multiple Partitions," J. Machine Learning Research, vol. 3, pp. 583-617, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A.L.N. Fred and A.K Jain, "Combining Multiple Clusterings Using Evidence Accumulation," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 6, pp. 835-850, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A.P. Topchy, A.K. Jain, and W.F. Punch, "Cluster Ensembles: Models of Consensus and Weak Partitions," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 27, no. 12, pp. 1866-1881, Dec. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H.G. Ayad and M.S. Kamel, "Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 1, pp. 16-173, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H.G. Ayad and M.S. Kamel, "On Voting-Based Consensus of Cluster Ensembles," Pattern Recognition, vol. 43, no. 5, pp. 1943- 1953, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Domeniconi and M. Al-Razgan, "Weighted Cluster Ensembles: Methods and Analysis," ACM Trans. Knowledge Discovery from Data, vol. 2, no. 4, pp. 1-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Yang and K. Chen, "Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations," IEEE Trans. Knowledge and Data Eng., vol. 23, no. 2, pp. 307-320, Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Dudoit and J. Fridlyand, "A Prediction-Based Resampling Method to Estimate the Number of Clusters in a Data Set," Genome Biology, vol. 3, no. 7, pp. 0036.1-0036.21, 2002.Google ScholarGoogle Scholar
  18. S. Dudoit and J. Fridlyand, "Bagging to Improve the Accuracy of a Clustering Procedure," Bioinformatics, vol. 19, no. 9, pp. 1090-1099, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data," Machine Learning, vol. 52, pp. 91-118, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Smyth and D. Coomans, "Clustering Microarrays with Predictive Weighted Ensembles," Proc. IEEE Symp. Computational Intelligence and Bioinformatics and Computational Biology (CIBCB '07), pp. 98-105, 2007.Google ScholarGoogle Scholar
  21. P. Mahata, "Exploratory Consensus of Hierarchical Clusterings for Melanoma and Breast Cancer," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 1, pp. 138-152, Jan.-Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Iam-on, T. Boongoen, and S. Garrett, "LCE: A Link-Based Cluster Ensemble Method for Improved Gene Expression Data Analysis," Bioinformatics, vol. 26, no. 12, pp. 1513-1519, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Iam-On et al., "Link-Based Cluster Ensembles for Heterogeneous Biological Data Analysis," Proc. IEEE Int'l Conf. Bioinformatics and Biomedicine (BIBM), pp. 573-578, 2010.Google ScholarGoogle Scholar
  24. S. Mimaroglu and E. Aksehirli, "DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 408-420, Mar./Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Smolkin and D. Ghosh, "Cluster Stability Scores for Microarray Data in Cancer Studies," BMC Bioinformatics, vol. 4, article 36, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Valentini, "Clusterv: A Tool for Assessing the Reliability of Clusters Discovered in DNA Microarray Data," Bioinformatics, vol. 22, no. 3, pp. 369-370, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Bertoni and G. Valentini, "Randomized Maps for Assessing the Reliability of Patients Clusters in DNA Microarray Data Analyses," Artificial Intelligence in Medicine, vol. 37, no. 2, pp. 85-109, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Bertoni and G. Valentini, "Model Order Selection for Biomolecular Data Clustering," BMC Bioinformatics, vol. 8, no. Suppl 2, article S7, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  29. Z. Yu, H.-S. Wong, and H. Wang, "Graph-Based Consensus Clustering for Class Discovery from Gene Expression Data," Bioinformatics, vol. 23, no. 21, pp. 2888-2896, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Yu and H.-S. Wong, "Knowledge Based Cluster Ensemble for Cancer Discovery from Biomolecular Data," IEEE Trans. NanoBioScience, vol. 10, no. 2, pp. 76-85, June 2011.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Avogadri and G. Valentini, "Fuzzy Ensemble Clustering Based on Random Projections for DNA Microarray Data Analysis," Artificial Intelligence in Medicine, vol. 45, nos. 2/3, pp. 173-183, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Z. Yu, H.-S. Wong, J. You, Q. Yang, and H. Liao, "Class Discovery from Gene Expression Data Based on Perturbation and Cluster Ensemble" IEEE Trans. NanoBioScience, vol. 8, no. 2, pp. 147-160, June 2009.Google ScholarGoogle ScholarCross RefCross Ref
  33. Z. Yu, L. Li, J. You, and G. Han, "SC3: Triple Spectral Clustering Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1751-1765, Nov./Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Yu, H.-S. Wong, J. You, G. Yu, and G. Han, "Hybrid Cluster Ensemble Framework Based on the Random Combination of Data Transformation Operators," Pattern Recognition, vol. 45, no. 5, pp. 1826-1837, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Chen, Z. Yu, G. Han, J. You, and L. Li, "NG2CE: Double Neural Gas Based Cluster Ensemble Framework," Proc. Seventh Int'l Conf. Computer Science and Education (ICCSE '12), pp. 26-31, 2012.Google ScholarGoogle Scholar
  36. Z. Yu, H. Chen, J. You, L. Li, and G. Han, "SOM2CE: Double Self-Organizing Map Based Cluster Ensemble Framework and Its Application in Cancer Gene Expression Profiles," Proc. 25th Int'l Conf. Industrial, Eng. and Other Applications of Applied Intelligent Systems (IEA/AIE '12), pp. 351-360, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E.J. Yeoh et al., "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling," Cancer Cell, vol. 1, pp. 133-143, Mar. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  38. A. Bhattacharjee et al., "Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinomas Sub-Classes," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 24, pp. 13790-13795, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  39. T.R. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression," Science, vol. 286, no. 5439, pp. 531-537, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Ramaswamy et al., "Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  41. A.I. Su et al., "Large-Scale Analysis of the Human and Mouse Transcriptomes," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 7, pp. 4465-4470, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  42. L.I. Kuncheva and D. Vetrov, "Evaluation of Stability of K-Means Cluster Ensembles with Respect to Random Initialization," IEEE Trans. Pattern Analysis Machine Intelligence, vol. 28, no. 11, pp. 1798-1808, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hybrid Fuzzy Cluster Ensemble Framework for Tumor Clustering from Biomolecular Data
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader