skip to main content
10.1145/956863.956942acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Mining multiple phenotype structures underlying gene expression profiles

Authors Info & Claims
Published:03 November 2003Publication History

ABSTRACT

DNA microarray technology is now widely used in basic biomedical research for mRNA expression profiling and are increasingly being used to explore patterns of gene expression in clinical research. Automatically detecting phenotype structures from gene expression profiles can provide deep insight into the nature of many diseases as well as lead in the development of new drugs. While most of the previous studies focus on only mining empirical phenotype structure which the experiment controls, it is also interesting to detect possible hidden phenotype structures underlying gene expression profiles.Since the number of samples is usually limited, such data sets are very sparse in high-dimensional gene space. Furthermore, most of the genes of interest are buried in large amount of noise. Unsupervised phenotype structure discovery of such sparse high-dimensional data sets present interesting but challenging problems. In this paper, we propose the model of simultaneously mining both empirical and hidden phenotype structures from gene expression data. We demonstrate the effectiveness and efficiency of the proposed method on various real-world data sets.

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pages 94--105, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alon U., Barkai N., Notterman D. A., Gish K., Ybarra S., Mack D. and Levine A.J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide array. Proc. Natl. Acad. Sci. USA, Vol. 96(12):6745--6750, June 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. Barash Y. and Friedman N. Context-specific bayesian clustering for gene expression data. In Proc. 5th Annual International Conference on Computational Molecular Biology (RECOMB), pages 12--20. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ben-Dor A., Shamir R. and Yakhini Z. Clustering gene expression patterns. Journal of Computational Biology, 6(3/4):281--297, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  5. Brown M. P. S., Grundy W. N., Lin D., Cristianini N., Sugnet C. W., Furey T. S., Ares M. Jr. and Haussler D. Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Natl. Acad. Sci., 97(1):262--267, January 2000.Google ScholarGoogle ScholarCross RefCross Ref
  6. Cheng Y., Church GM. Biclustering of expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), 8:93--103, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ding, Chris. Analysis of gene expression profiles: class discovery and leaf ordering. In Proc. of International Conference on Computational Molecular Biology (RECOMB), pages 127--136, Washington, DC., April 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eisen M. B., Spellman P. T., Brown P. O. and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, Vol. 95:14863--14868, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  9. Golub T. R., Slonim D. K. et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, Vol. 286(15):531--537, October 1999.Google ScholarGoogle ScholarCross RefCross Ref
  10. Hedenfalk, I., Duggan, D., Chen, Y. D., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Kallioniemi, O. P., Wilfond, B., Borg, A., and Trent, J. Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine, 344(8):539--548, February 2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kirkpatrick, S., Gelatt, C. D. Jr., andVecchi, M. P. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander. OPTICS: Ordering Points To Identify the Clustering Structure. Sigmod, pages 49--60, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ng, Raymond T. and Han, Jiawei. Clarans: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5):1003--1016, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Peterson Leif E. Factor analysis of cluster-specific gene expression levels from cdna microarrays. Computer Methods and Programs in Biomedicine, 69(3):179--188, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. Rand, W. M. Objective criteria for evaluation of clustering methods. Journal of the American Statistical Association, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  16. Rhodes, D. R., Miller, J. C., Haab, B. B., Furge, K. A. CIT: Identification of Differentially Expressed Clusters of Genes from Microarray Data. Bioinformatics, 18:205--206, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  17. Schloegel, Kirk, Karypis, George. CRPC Parallel Computing Handbook, chapter Graph Partitioning For High Performance Scientific Simulations. Morgan Kaufmann, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shamir R. and Sharan R. Click: A clustering algorithm for gene expression analysis. In In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB '00). AAAI Press., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tang, Chun and Zhang, Aidong. An iterative strategy for pattern discovery in high-dimensional data sets. In Proceeding of 11th International Conference on Information and Knowledge Management (CIKM 02), McLean, VA, November 4-9 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Thomas J. G., Olson J. M., Tapscott S. J. and Zhao L. P. An Efficient and Robust Statistical Modeling Approach to Discover Differentially Expressed Genes Using Genomic Expression Profiles. Genome Research, 11(7):1227--1236, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. Xing E. P. and Karp R. M. Cliff: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. Bioinformatics, Vol. 17(1):306--315, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  22. Xu, Ying, Olman, Victor and Xu, Dong. Clustering gene expression data using a graph-theoretic approach: An application of minimum spanning trees. Bioinformatics, 18(4):536--545, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  23. Yang, Jiong, Wang, Wei, Wang, Haixun and Yu, Philip S. δ-cluster: Capturing Subspace Correlation in a Large Data Set. In Proceedings of 18th International Conference on Data Engineering (ICDE 2002), pages 517--528, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yeung, Ka Yee and Ruzzo, Walter L. An empirical study on principal component analysis for clustering gene expression data. Technical Report UW-CSE-2000-11-03, Department of Computer Science & Engineering, University of Washington, 2000.Google ScholarGoogle Scholar
  25. Yeung, K. Y., Fraley, C., Murua, A., Raftery, A. E., Ruzzo, W. L. Model-based clustering and data transformations for gene expression data. Bioinformatics, 17:977--987, 2001.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mining multiple phenotype structures underlying gene expression profiles

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
      November 2003
      592 pages
      ISBN:1581137230
      DOI:10.1145/956863

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader