skip to main content
article

Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

Published:01 July 2009Publication History
Skip Abstract Section

Abstract

In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.

References

  1. B. Schölkopf, K. Tsuda, and J.-P. Vert, Kernel Methods in Computational Biology. MIT Press, 2004.Google ScholarGoogle Scholar
  2. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Ben-Hur, C.S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, "Support Vector Machines and Kernels for Computational Biology," PLoS Computational Biology, vol. 4, no. 10, Oct. 2008.Google ScholarGoogle Scholar
  4. C. Cortes and V. Vapnik, "Support Vector Networks," Machine Learning, vol. 20, pp. 273-297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Ho et al., "Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry," Nature, vol. 415, pp. 180-183, Jan. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  6. A.-C. Gavin et al., "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, pp. 141-147, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. G.D. Bader and C.W.V. Hogue, "Analyzing Yeast Protein-Protein Interaction Data Obtained from Different Sources," Nature Biotechnology, vol. 20, pp. 991-997, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, "Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions," Nature, vol. 417, pp. 399-403, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Jansen and M. Gerstein, "Analyzing Protein Function on a Genomic Scale: The Importance of Gold-Standard Positives and Negatives for Network Prediction," Current Opinion in Microbiology , vol. 7, pp. 535-545, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  10. G.D. Bader and C.W.V. Hogue, "An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks," BMC Bioinformatics, vol. 4, 2003.Google ScholarGoogle Scholar
  11. R. Jansen, Y. Haiyuan, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, and M. Gerstein, "A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data," Science, vol. 302, pp. 449-453, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. L.V. Zhang, S.L. Wong, O.D. King, and F.P. Roth, "Predicting Co-Complexed Protein Pairs Using Genomic and Proteomic Data Integration," BMC Bioinformatics, vol. 5, i38-i46, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  13. R.J.P. van Berlo, L.F.A. Wessels, D. de Ridder, and M.J.T. Reinders, "Protein Complex Prediction Using an Integrative Bioinformatics Approach," J. Bioinformatics and Computational Biology, vol. 4, pp. 839-861, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  14. Y. Qi, Z. Bar-Joseph, and J. Klein-Seetharaman, "Evaluation of Different Biological Data and Computational Classification Methods for Use in Protein Interaction Prediction," Proteins: Structure, Function, and Bioinformatics, vol. 63, pp. 490-500, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 323- 330, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G.R.G. Lanckriet, T. de Bie, N. Cristianini, M.I. Jordan, and W.S. Noble, "A Statistical Framework for Genomic Data Fusion," Bioinformatics, vol. 20, pp. 2626-2635, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines," Machine Learning, vol. 46, pp. 131-159, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Bousquet and D. Herrmann, "On the Complexity of Learning the Kernel Matrix," Advances in Neural Information Processing Systems, S. Becker, S. Thrun, and K. Obermayer, eds., vol. 15. pp. 415-422, MIT Press, 2003.Google ScholarGoogle Scholar
  20. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On Kernel Target Alignment," Advances in Neural Information Processing Systems, vol. 14, pp. 367-374, MIT Press, 2002.Google ScholarGoogle Scholar
  21. P. Pavlidis, J. Weston, C. Jinsong, and W.S. Noble, "Learning Gene Functional Classifications from Multiple Data Types," J. Computational Biology, vol. 9, pp. 401-412, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Ben-Hur and W.S. Noble, "Kernel Methods for Predicting Protein-Protein Interactions," Bioinformatics, vol. 21, no. 1, pp. i38- i46, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Martin, D. Roe, and J.-L. Faulon, "Predicting Protein-Protein Interactions Using Signature Products," Bioinformatics, vol. 21, pp. 218-226, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Phienthrakul and B. Kijsirikul, "Evolutionary Strategies for Multi-Scale Radial Basis Function Kernels in Support Vector Machines," Proc. 2005 Conf. Genetic and Evolutionary Computation, pp. 905-911, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Friedrichs and C. Igel, "Evolutionary Tuning of Multiple SVM Parameters," Proc. European Symp. Artificial Neural Networks (ESANN), pp. 519-524, 2004.Google ScholarGoogle Scholar
  26. T. De Bie, C.-L. Tranchevent, L.M.M. van Oeffelen, and Y. Moreau, "Kernel-Based Data Fusion for Gene Prioritization," Bioinformatics, pp. i125-i132, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Hansen, S.D. Müller, and P. Koumoutsakos, "Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES)," Evolutionary Computation , vol. 11, pp. 1-18, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Auger and N. Hansen, "A Restart CMA Evolution Strategy with Increasing Population Size," Proc. IEEE Congress Evolutionary Computation (CEC '05), vol. 2, pp. 1769-1776, 2005.Google ScholarGoogle Scholar
  29. F. Bach, G.R.G. Lanckriet, and M.I. Jordan, "Multiple Kernel Learning, Conic Duality, and the SMO Algorithm," Proc. 21st Int'l Conf. Machine Learning, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Bach, R. Thibaux, and M.I. Jordan, "Computing Regularization Paths for Learning Multiple Kernels," J. Machine Learning Research, vol. 5, pp. 1391-1415, 2004.Google ScholarGoogle Scholar
  31. H.W. Mewes, K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, and D. Frishman, "MIPS: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 28, pp. 37-40, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  32. A. Ben-Hur and W.S. Noble, "Choosing Negative Examples for the Prediction of Protein-Protein Interactions," BMC Bioinformatics , vol. 7, no. 1, 2006.Google ScholarGoogle Scholar
  33. J. Gollub et al., "The Stanford Microarray Database: Data Access and Quality Assessment Tools," Nucleic Acids Research, vol. 31, no. 1, pp. 94-96, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  34. T.R. Hughes et al., "Functional Discovery via a Compendium of Expression Profiles," Cell, vol. 102, pp. 109-126, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  35. S.L. Tai, V.M. Boer, P. Daran-Lapujade, M.C. Walsh, J.H. de Winde, J.-M. Daran, and J.T. Pronk, "Two-Dimensional Transcriptome Analysis in Chemostat Cultures: Combinatorial Effects of Oxygen Availability and Macronutrient Limitation in Saccharomyces Cerevisiae," J. Biological Chemistry, vol. 280, no. 1, pp. 437- 447, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  36. M.J. Dunham, H. Badrane, T. Ferea, J. Adams, P.O. Brown, F. Rosenzweig, and D. Botstein, "Characteristic Genome Rearrangements in Experimental Evolution of Saccharomyces Cerevisiae," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 16144-16149, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  37. J.D. Lieb, L. Xiaole, D. Botstein, and P.O. Brown, "Promoter-Specific Binding of Rap1 Revealed by Genome-Wide Maps of Protein-DNA Association," Nature Genetics, vol. 28, pp. 327-334, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  38. C.T. Harbison et al., "Transcriptional Regulatory Code of a Eukaryotic Genome," Nature, vol. 431, pp. 99-104, Sept. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  39. N.J. Krogan et al., "High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes," Molecular Cell, vol. 13, pp. 225-239, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  40. I. Xenarios, L. Salwinski, X.J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, "DIP, the Database of Interacting Proteins: A Research Tool for Studying Cellular Networks of Protein Interactions," Nucleic Acids Research, vol. 30, no. 1, pp. 303-305, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  41. L.J. Lu, Y. Xia, A. Paccanaro, H. Yu, and M. Gerstein, "Assessing the Limits of Genomic Data Integration for Predicting Protein Networks," Genome Research, vol. 15, no. 7, pp. 945-953, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  42. H. Yu, N.M. Luscombe, H.X. Lu, X. Zhu, Y. Xia, J.-D.J. Han, N. Bertin, S. Chung, M. Vidal, and M. Gerstein, "Annotation Transfer between Genomes: Protein-Protein Interologs and Protein-dna Regulogs," Genome Research, vol. 14, no. 6, pp. 1107-1118, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  43. C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machine," http://www.csie.ntu.edu.tw/cjlin/libsvm, 2008.Google ScholarGoogle Scholar
  44. R.P.W. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, and D.M.J. Tax, "PRTools, A Matlab Toolbox for Pattern Recognition," http://prtools.org, 2009.Google ScholarGoogle Scholar
  45. C.-W. Hsu, C.-C. Chang, and L. Chih-Jen, "A Practical Guide to Support Vector Classification," technical report, Dept. of Computer Science and Information Eng., Nat'l Taiwan Univ., 2003.Google ScholarGoogle Scholar
  46. D.P. Lewis, T. Jebara, and W.S. Noble, "Support Vector Machine Learning from Heterogeneous Data: An Empirical Analysis Using Protein Sequence and Structure," Bioinformatics, vol. 22, pp. 2753- 2760, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S.L. Wong et al., "Combining Biological Networks to Predict Genetic Interactions," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 15682-15687, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  48. W.-C. Kao, K.-M. Chung, C.-L. Sun, and C.-J. Lin, "Decomposition Methods for Linear Support Vector Machines," Neural Computation , vol. 16, no. 8, pp. 1689-1704, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. T. Joachims, "A Support Vector Method for Multivariate Performance Measures," Proc. 22nd Int'l Conf. Machine Learning (ICML '05), pp. 377-384, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader