research-article

Multi-view prediction of protein function

Authors:
Artem Sokolov

Colorado State University, Fort Collins, CO

Colorado State University, Fort Collins, CO
View Profile

,
Asa Ben-Hur

Colorado State University, Fort Collins, CO

Colorado State University, Fort Collins, CO
View Profile

BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and BiomedicineAugust 2011Pages 135–142https://doi.org/10.1145/2147805.2147820

Published:01 August 2011Publication History

BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pages 135–142

ABSTRACT

The problem of predicting protein function using Gene Ontology terms is a hierarchical classification problem. There are a variety of genomic data that are relevant to a protein's function: its sequence, its interactions with other proteins, expression of its gene, etc. Some of these sources (interactions and expression) are species-specific, while protein sequence is comparable across species, which complicates the task of integrating labeled data from a target species with labeled data from other species. We address this problem using the methodology of structured output learning, present a framework based on multi-view learning that is naturally suited for combining both types of data, and demonstrate its effectiveness in making predictions for proteins in S. cerevisiae and M. musculus. The code for our framework is available at http://strut.sourceforge.net.

References

S.F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol, 215(3):403--410, 1990.Google ScholarCross Ref
K. Astikainen, L. Holm, E. Pitkanen, S. Szedmak, and J. Rousu. Towards structured output prediction of enzyme function. BMC proceedings, 2(Suppl 4):S2, 2008.Google ScholarCross Ref
I. Bahir and M. Linial. Functional grouping based on signatures in protein termini. Proteins: Structure, Function, and Bioinformatics, 63(4):996--1004, 2006.Google Scholar
Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7):830, 2006. Google ScholarDigital Library
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, page 100. ACM, 1998. Google ScholarDigital Library
U. Brefeld and T. Scheffer. Semi-supervised learning for structured output variables. In Proceedings of the 23rd international conference on Machine learning, pages 145--152. ACM, 2006. Google ScholarDigital Library
C.M. Christoudias, R. Urtasun, and T. Darrell. Multi-view learning in the presence of view disagreement. In UAI, page 5, 2008.Google Scholar
A. Coletta, J. W. Pinney, D. Y. W. Solís, J. Marsh, S. R. Pettifer, and T. K. Attwood. Low-complexity regions within protein sequences have position-dependent roles. BMC systems biology, 4(1):43, 2010.Google Scholar
M. Deng, T. Chen, and F. Sun. An integrated probabilistic model for functional prediction of proteins. In RECOMB, pages 95--103, 2003. Google ScholarDigital Library
K. Ganchev, J. Graca, J. Blitzer, and B. Taskar. Multi-view learning over structured and non-identical outputs. In Proceedings of The 24th Conference on Uncertainty in Artificial Intelligence. Citeseer, 2008.Google Scholar
Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet., 25(1):25--9, 2000.Google ScholarCross Ref
P. Horton, K. J. Park, T. Obayashi, and K. Nakai. Protein subcellular localization prediction with WoLF PSORT. In Proceedings of the 4th annual Asia Pacific bioinformatics conference APBC06, Taipei, Taiwan, volume 39, page 48. Citeseer, 2006.Google Scholar
L.J. Jensen, M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, et al. STRING 8.a global view on proteins and their functional interactions in 630 organisms. Nucleic acids research, 37(suppl 1):D412, 2009.Google Scholar
A. Krogh, B.È. Larsson, G. Von Heijne, and E. L. L. Sonnhammer. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes1. Journal of molecular biology, 305(3):567--580, 2001.Google Scholar
H. Lee, Z. Tu, M. Deng, F. Sun, and T. Chen. Diffusion kernel-based logistic regression models for protein function prediction. OMICS: A Journal of Integrative Biology, 10(1):40--55, 2006.Google ScholarCross Ref
Y. Loewenstein, D. Raimondo, O. Redfern, J. Watson, D. Frishman, M. Linial, C. Orengo, J. Thornton, and A. Tramontano. Protein function annotation by homology-based inference. Genome Biology, 10(2):207, 2009.Google ScholarCross Ref
B. Long, P. S. Yu, and Z. M. Zhang. A general model for multiple view unsupervised learning. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM'08), Atlanta, Georgia, USA. Citeseer, 2008.Google ScholarCross Ref
S. Mostafavi and Q. Morris. Using The Gene Ontology Hierarchy when Predicting Gene Function. In Conference on Uncertainty in Artificial Intelligence, 2009. Google ScholarDigital Library
S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9(Suppl 1):S4, 2008.Google ScholarCross Ref
G. Obozinski, G. Lanckriet, C. Grant, M. Jordan, and W. Noble. Consistent probabilistic outputs for protein function prediction. Genome Biology, 9(Suppl 1):S6, 2008.Google ScholarCross Ref
L. Peña-Castillo, M. Tasan, C. Myers, H. Lee, T. Joshi, C. Zhang, Y. Guan, M. Leone, A. Pagnani, W. Kim, et al. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biology, 9(Suppl 1):S2, 2008.Google ScholarCross Ref
J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector Learning, 208, 1999.Google Scholar
M.F. Rogers and A. Ben-Hur. The use of Gene Ontology evidence codes in preventing classifier assessment bias. Bioinformatics, 25(9):1173, 2009. Google ScholarDigital Library
B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
A. Sokolov and A. Ben-Hur. Hierarchical classification of Gene Ontology terms using the GOstruct method. Journal of Bioinformatics and Compuational Biology, 8(2):357--376, 2010.Google ScholarCross Ref
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006. Google ScholarDigital Library
K. Tsuda, H. J. Shin, and B. Schölkopf. Fast protein classification with multiple networks. In ECCB, 2005.Google ScholarDigital Library
A. Zien, U. Brefeld, and T. Scheffer. Transductive support vector machines for structured variables. In Proceedings of the 24th international conference on Machine learning, page 1190. ACM, 2007. Google ScholarDigital Library

Index Terms

Multi-view prediction of protein function
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies
  1. Machine learning

Recommendations

Combining homolog and motif similarity data with Gene Ontology relationships for protein function prediction
BIBM '12: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Uncharacterized proteins pose a challenge not just to functional genomics, but also to biology in general. The knowledge of biochemical functions of such proteins is very critical for designing efficient therapeutic techniques. The bottleneck in ...
Read More
A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species
Highlights
- A deep learning ensemble for protein function prediction of 9 bacterial phyla into multi-class and multi-valued labels.
Abstract
Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but ...
Read More
Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

Display Omitted In this study, we analyzed CuNiR enzyme of bacterial spp. which are responsible for denitrification process.All the selected species contain disordered regions in their primary structure.Phylogenetic tree revealed that the selected ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
August 2011
688 pages
ISBN:9781450307963
DOI:10.1145/2147805
General Chairs:
Robert Grossman
University of Chicago
,
Andrey Rzhetsky
University of Chicago
,
Program Chairs:
Sun Kim
Indiana University Bloomington and Seoul National University
,
Wei Wang
University of North Carolina at Chapel Hill
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
function prediction
kernel methods
multi-view learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of885submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 130
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-view prediction of protein function

BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine

ABSTRACT

References

Cited By

Index Terms

Recommendations

Combining homolog and motif similarity data with Gene Ontology relationships for protein function prediction

A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species

Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-view prediction of protein function

BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine

ABSTRACT

References

Cited By

Index Terms

Recommendations

Combining homolog and motif similarity data with Gene Ontology relationships for protein function prediction

A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species

Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media