ABSTRACT
The problem of predicting protein function using Gene Ontology terms is a hierarchical classification problem. There are a variety of genomic data that are relevant to a protein's function: its sequence, its interactions with other proteins, expression of its gene, etc. Some of these sources (interactions and expression) are species-specific, while protein sequence is comparable across species, which complicates the task of integrating labeled data from a target species with labeled data from other species. We address this problem using the methodology of structured output learning, present a framework based on multi-view learning that is naturally suited for combining both types of data, and demonstrate its effectiveness in making predictions for proteins in S. cerevisiae and M. musculus. The code for our framework is available at http://strut.sourceforge.net.
- S.F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol, 215(3):403--410, 1990.Google ScholarCross Ref
- K. Astikainen, L. Holm, E. Pitkanen, S. Szedmak, and J. Rousu. Towards structured output prediction of enzyme function. BMC proceedings, 2(Suppl 4):S2, 2008.Google ScholarCross Ref
- I. Bahir and M. Linial. Functional grouping based on signatures in protein termini. Proteins: Structure, Function, and Bioinformatics, 63(4):996--1004, 2006.Google Scholar
- Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7):830, 2006. Google ScholarDigital Library
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the eleventh annual conference on Computational learning theory, page 100. ACM, 1998. Google ScholarDigital Library
- U. Brefeld and T. Scheffer. Semi-supervised learning for structured output variables. In Proceedings of the 23rd international conference on Machine learning, pages 145--152. ACM, 2006. Google ScholarDigital Library
- C.M. Christoudias, R. Urtasun, and T. Darrell. Multi-view learning in the presence of view disagreement. In UAI, page 5, 2008.Google Scholar
- A. Coletta, J. W. Pinney, D. Y. W. Solís, J. Marsh, S. R. Pettifer, and T. K. Attwood. Low-complexity regions within protein sequences have position-dependent roles. BMC systems biology, 4(1):43, 2010.Google Scholar
- M. Deng, T. Chen, and F. Sun. An integrated probabilistic model for functional prediction of proteins. In RECOMB, pages 95--103, 2003. Google ScholarDigital Library
- K. Ganchev, J. Graca, J. Blitzer, and B. Taskar. Multi-view learning over structured and non-identical outputs. In Proceedings of The 24th Conference on Uncertainty in Artificial Intelligence. Citeseer, 2008.Google Scholar
- Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet., 25(1):25--9, 2000.Google ScholarCross Ref
- P. Horton, K. J. Park, T. Obayashi, and K. Nakai. Protein subcellular localization prediction with WoLF PSORT. In Proceedings of the 4th annual Asia Pacific bioinformatics conference APBC06, Taipei, Taiwan, volume 39, page 48. Citeseer, 2006.Google Scholar
- L.J. Jensen, M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, et al. STRING 8.a global view on proteins and their functional interactions in 630 organisms. Nucleic acids research, 37(suppl 1):D412, 2009.Google Scholar
- A. Krogh, B.È. Larsson, G. Von Heijne, and E. L. L. Sonnhammer. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes1. Journal of molecular biology, 305(3):567--580, 2001.Google Scholar
- H. Lee, Z. Tu, M. Deng, F. Sun, and T. Chen. Diffusion kernel-based logistic regression models for protein function prediction. OMICS: A Journal of Integrative Biology, 10(1):40--55, 2006.Google ScholarCross Ref
- Y. Loewenstein, D. Raimondo, O. Redfern, J. Watson, D. Frishman, M. Linial, C. Orengo, J. Thornton, and A. Tramontano. Protein function annotation by homology-based inference. Genome Biology, 10(2):207, 2009.Google ScholarCross Ref
- B. Long, P. S. Yu, and Z. M. Zhang. A general model for multiple view unsupervised learning. In Proceedings of the 8th SIAM International Conference on Data Mining (SDM'08), Atlanta, Georgia, USA. Citeseer, 2008.Google ScholarCross Ref
- S. Mostafavi and Q. Morris. Using The Gene Ontology Hierarchy when Predicting Gene Function. In Conference on Uncertainty in Artificial Intelligence, 2009. Google ScholarDigital Library
- S. Mostafavi, D. Ray, D. Warde-Farley, C. Grouios, and Q. Morris. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9(Suppl 1):S4, 2008.Google ScholarCross Ref
- G. Obozinski, G. Lanckriet, C. Grant, M. Jordan, and W. Noble. Consistent probabilistic outputs for protein function prediction. Genome Biology, 9(Suppl 1):S6, 2008.Google ScholarCross Ref
- L. Peña-Castillo, M. Tasan, C. Myers, H. Lee, T. Joshi, C. Zhang, Y. Guan, M. Leone, A. Pagnani, W. Kim, et al. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biology, 9(Suppl 1):S2, 2008.Google ScholarCross Ref
- J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods-Support Vector Learning, 208, 1999.Google Scholar
- M.F. Rogers and A. Ben-Hur. The use of Gene Ontology evidence codes in preventing classifier assessment bias. Bioinformatics, 25(9):1173, 2009. Google ScholarDigital Library
- B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
- A. Sokolov and A. Ben-Hur. Hierarchical classification of Gene Ontology terms using the GOstruct method. Journal of Bioinformatics and Compuational Biology, 8(2):357--376, 2010.Google ScholarCross Ref
- I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006. Google ScholarDigital Library
- K. Tsuda, H. J. Shin, and B. Schölkopf. Fast protein classification with multiple networks. In ECCB, 2005.Google ScholarDigital Library
- A. Zien, U. Brefeld, and T. Scheffer. Transductive support vector machines for structured variables. In Proceedings of the 24th international conference on Machine learning, page 1190. ACM, 2007. Google ScholarDigital Library
Index Terms
- Multi-view prediction of protein function
Recommendations
Combining homolog and motif similarity data with Gene Ontology relationships for protein function prediction
BIBM '12: Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)Uncharacterized proteins pose a challenge not just to functional genomics, but also to biology in general. The knowledge of biochemical functions of such proteins is very critical for designing efficient therapeutic techniques. The bottleneck in ...
A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species
Highlights- A deep learning ensemble for protein function prediction of 9 bacterial phyla into multi-class and multi-valued labels.
AbstractProtein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but ...
Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.
Display Omitted In this study, we analyzed CuNiR enzyme of bacterial spp. which are responsible for denitrification process.All the selected species contain disordered regions in their primary structure.Phylogenetic tree revealed that the selected ...
Comments