skip to main content
10.1145/2147805.2147820acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Multi-view prediction of protein function

Published:01 August 2011Publication History

ABSTRACT

The problem of predicting protein function using Gene Ontology terms is a hierarchical classification problem. There are a variety of genomic data that are relevant to a protein's function: its sequence, its interactions with other proteins, expression of its gene, etc. Some of these sources (interactions and expression) are species-specific, while protein sequence is comparable across species, which complicates the task of integrating labeled data from a target species with labeled data from other species. We address this problem using the methodology of structured output learning, present a framework based on multi-view learning that is naturally suited for combining both types of data, and demonstrate its effectiveness in making predictions for proteins in S. cerevisiae and M. musculus. The code for our framework is available at http://strut.sourceforge.net.

References

  1. S.F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol, 215(3):403--410, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  2. K. Astikainen, L. Holm, E. Pitkanen, S. Szedmak, and J. Rousu. Towards structured output prediction of enzyme function. BMC proceedings, 2(Suppl 4):S2, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  3. I. Bahir and M. Linial. Functional grouping based on signatures in protein termini. Proteins: Structure, Function, and Bioinformatics, 63(4):996--1004, 2006.Google ScholarGoogle Scholar
  4. Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7):830, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, page 100. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. Brefeld and T. Scheffer. Semi-supervised learning for structured output variables. In Proceedings of the 23rd international conference on Machine learning, pages 145--152. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.M. Christoudias, R. Urtasun, and T. Darrell. Multi-view learning in the presence of view disagreement. In UAI, page 5, 2008.Google ScholarGoogle Scholar
  8. A. Coletta, J. W. Pinney, D. Y. W. Solís, J. Marsh, S. R. Pettifer, and T. K. Attwood. Low-complexity regions within protein sequences have position-dependent roles. BMC systems biology, 4(1):43, 2010.Google ScholarGoogle Scholar
  9. M. Deng, T. Chen, and F. Sun. An integrated probabilistic model for functional prediction of proteins. In RECOMB, pages 95--103, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Ganchev, J. Graca, J. Blitzer, and B. Taskar. Multi-view learning over structured and non-identical outputs. In Proceedings of The 24th Conference on Uncertainty in Artificial Intelligence. Citeseer, 2008.Google ScholarGoogle Scholar
  11. Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet., 25(1):25--9, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Horton, K. J. Park, T. Obayashi, and K. Nakai. Protein subcellular localization prediction with WoLF PSORT. In Proceedings of the 4th annual Asia Pacific bioinformatics conference APBC06, Taipei, Taiwan, volume 39, page 48. Citeseer, 2006.Google ScholarGoogle Scholar
  13. L.J. Jensen, M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, et al. STRING 8.a global view on proteins and their functional interactions in 630 organisms. Nucleic acids research, 37(suppl 1):D412, 2009.Google ScholarGoogle Scholar
  14. A. Krogh, B.È. Larsson, G. Von Heijne, and E. L. L. Sonnhammer. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes1. Journal of molecular biology, 305(3):567--580, 2001.Google ScholarGoogle Scholar
  15. H. Lee, Z. Tu, M. Deng, F. Sun, and T. Chen. Diffusion kernel-based logistic regression models for protein function prediction. OMICS: A Journal of Integrative Biology, 10(1):40--55, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  16. Y. Loewenstein, D. Raimondo, O. Redfern, J. Watson, D. Frishman, M. Linial, C. Orengo, J. Thornton, and A. Tramontano. Protein function annotation by homology-based inference. Genome Biology, 10(2):207, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. Long, P. S. Yu, and Z. M. Zhang. A general model for multiple view unsupervised learning. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM'08), Atlanta, Georgia, USA. Citeseer, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Mostafavi and Q. Morris. Using The Gene Ontology Hierarchy when Predicting Gene Function. In Conference on Uncertainty in Artificial Intelligence, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9(Suppl 1):S4, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Obozinski, G. Lanckriet, C. Grant, M. Jordan, and W. Noble. Consistent probabilistic outputs for protein function prediction. Genome Biology, 9(Suppl 1):S6, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  21. L. Peña-Castillo, M. Tasan, C. Myers, H. Lee, T. Joshi, C. Zhang, Y. Guan, M. Leone, A. Pagnani, W. Kim, et al. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biology, 9(Suppl 1):S2, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector Learning, 208, 1999.Google ScholarGoogle Scholar
  23. M.F. Rogers and A. Ben-Hur. The use of Gene Ontology evidence codes in preventing classifier assessment bias. Bioinformatics, 25(9):1173, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google ScholarGoogle Scholar
  25. A. Sokolov and A. Ben-Hur. Hierarchical classification of Gene Ontology terms using the GOstruct method. Journal of Bioinformatics and Compuational Biology, 8(2):357--376, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Tsuda, H. J. Shin, and B. Schölkopf. Fast protein classification with multiple networks. In ECCB, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Zien, U. Brefeld, and T. Scheffer. Transductive support vector machines for structured variables. In Proceedings of the 24th international conference on Machine learning, page 1190. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-view prediction of protein function

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
            August 2011
            688 pages
            ISBN:9781450307963
            DOI:10.1145/2147805
            • General Chairs:
            • Robert Grossman,
            • Andrey Rzhetsky,
            • Program Chairs:
            • Sun Kim,
            • Wei Wang

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 August 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate254of885submissions,29%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader