Abstract
In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.
- B. Schölkopf, K. Tsuda, and J.-P. Vert, Kernel Methods in Computational Biology. MIT Press, 2004.Google Scholar
- J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004. Google ScholarDigital Library
- A. Ben-Hur, C.S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, "Support Vector Machines and Kernels for Computational Biology," PLoS Computational Biology, vol. 4, no. 10, Oct. 2008.Google Scholar
- C. Cortes and V. Vapnik, "Support Vector Networks," Machine Learning, vol. 20, pp. 273-297, 1995. Google ScholarDigital Library
- Y. Ho et al., "Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry," Nature, vol. 415, pp. 180-183, Jan. 2002.Google ScholarCross Ref
- A.-C. Gavin et al., "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, pp. 141-147, 2002.Google ScholarCross Ref
- G.D. Bader and C.W.V. Hogue, "Analyzing Yeast Protein-Protein Interaction Data Obtained from Different Sources," Nature Biotechnology, vol. 20, pp. 991-997, 2002.Google ScholarCross Ref
- C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, "Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions," Nature, vol. 417, pp. 399-403, 2002.Google ScholarCross Ref
- R. Jansen and M. Gerstein, "Analyzing Protein Function on a Genomic Scale: The Importance of Gold-Standard Positives and Negatives for Network Prediction," Current Opinion in Microbiology , vol. 7, pp. 535-545, 2004.Google ScholarCross Ref
- G.D. Bader and C.W.V. Hogue, "An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks," BMC Bioinformatics, vol. 4, 2003.Google Scholar
- R. Jansen, Y. Haiyuan, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, and M. Gerstein, "A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data," Science, vol. 302, pp. 449-453, 2003.Google ScholarCross Ref
- L.V. Zhang, S.L. Wong, O.D. King, and F.P. Roth, "Predicting Co-Complexed Protein Pairs Using Genomic and Proteomic Data Integration," BMC Bioinformatics, vol. 5, i38-i46, 2004.Google ScholarCross Ref
- R.J.P. van Berlo, L.F.A. Wessels, D. de Ridder, and M.J.T. Reinders, "Protein Complex Prediction Using an Integrative Bioinformatics Approach," J. Bioinformatics and Computational Biology, vol. 4, pp. 839-861, 2007.Google ScholarCross Ref
- Y. Qi, Z. Bar-Joseph, and J. Klein-Seetharaman, "Evaluation of Different Biological Data and Computational Classification Methods for Use in Protein Interaction Prediction," Proteins: Structure, Function, and Bioinformatics, vol. 63, pp. 490-500, 2006.Google ScholarCross Ref
- G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 323- 330, 2002. Google ScholarDigital Library
- G.R.G. Lanckriet, T. de Bie, N. Cristianini, M.I. Jordan, and W.S. Noble, "A Statistical Framework for Genomic Data Fusion," Bioinformatics, vol. 20, pp. 2626-2635, 2004. Google ScholarDigital Library
- S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006. Google ScholarDigital Library
- O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines," Machine Learning, vol. 46, pp. 131-159, 2002. Google ScholarDigital Library
- O. Bousquet and D. Herrmann, "On the Complexity of Learning the Kernel Matrix," Advances in Neural Information Processing Systems, S. Becker, S. Thrun, and K. Obermayer, eds., vol. 15. pp. 415-422, MIT Press, 2003.Google Scholar
- N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On Kernel Target Alignment," Advances in Neural Information Processing Systems, vol. 14, pp. 367-374, MIT Press, 2002.Google Scholar
- P. Pavlidis, J. Weston, C. Jinsong, and W.S. Noble, "Learning Gene Functional Classifications from Multiple Data Types," J. Computational Biology, vol. 9, pp. 401-412, 2002.Google ScholarCross Ref
- A. Ben-Hur and W.S. Noble, "Kernel Methods for Predicting Protein-Protein Interactions," Bioinformatics, vol. 21, no. 1, pp. i38- i46, 2005. Google ScholarDigital Library
- S. Martin, D. Roe, and J.-L. Faulon, "Predicting Protein-Protein Interactions Using Signature Products," Bioinformatics, vol. 21, pp. 218-226, 2005. Google ScholarDigital Library
- T. Phienthrakul and B. Kijsirikul, "Evolutionary Strategies for Multi-Scale Radial Basis Function Kernels in Support Vector Machines," Proc. 2005 Conf. Genetic and Evolutionary Computation, pp. 905-911, 2005. Google ScholarDigital Library
- F. Friedrichs and C. Igel, "Evolutionary Tuning of Multiple SVM Parameters," Proc. European Symp. Artificial Neural Networks (ESANN), pp. 519-524, 2004.Google Scholar
- T. De Bie, C.-L. Tranchevent, L.M.M. van Oeffelen, and Y. Moreau, "Kernel-Based Data Fusion for Gene Prioritization," Bioinformatics, pp. i125-i132, 2007. Google ScholarDigital Library
- N. Hansen, S.D. Müller, and P. Koumoutsakos, "Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES)," Evolutionary Computation , vol. 11, pp. 1-18, 2003. Google ScholarDigital Library
- A. Auger and N. Hansen, "A Restart CMA Evolution Strategy with Increasing Population Size," Proc. IEEE Congress Evolutionary Computation (CEC '05), vol. 2, pp. 1769-1776, 2005.Google Scholar
- F. Bach, G.R.G. Lanckriet, and M.I. Jordan, "Multiple Kernel Learning, Conic Duality, and the SMO Algorithm," Proc. 21st Int'l Conf. Machine Learning, 2004. Google ScholarDigital Library
- F. Bach, R. Thibaux, and M.I. Jordan, "Computing Regularization Paths for Learning Multiple Kernels," J. Machine Learning Research, vol. 5, pp. 1391-1415, 2004.Google Scholar
- H.W. Mewes, K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, and D. Frishman, "MIPS: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 28, pp. 37-40, 2000.Google ScholarCross Ref
- A. Ben-Hur and W.S. Noble, "Choosing Negative Examples for the Prediction of Protein-Protein Interactions," BMC Bioinformatics , vol. 7, no. 1, 2006.Google Scholar
- J. Gollub et al., "The Stanford Microarray Database: Data Access and Quality Assessment Tools," Nucleic Acids Research, vol. 31, no. 1, pp. 94-96, 2003.Google ScholarCross Ref
- T.R. Hughes et al., "Functional Discovery via a Compendium of Expression Profiles," Cell, vol. 102, pp. 109-126, 2000.Google ScholarCross Ref
- S.L. Tai, V.M. Boer, P. Daran-Lapujade, M.C. Walsh, J.H. de Winde, J.-M. Daran, and J.T. Pronk, "Two-Dimensional Transcriptome Analysis in Chemostat Cultures: Combinatorial Effects of Oxygen Availability and Macronutrient Limitation in Saccharomyces Cerevisiae," J. Biological Chemistry, vol. 280, no. 1, pp. 437- 447, 2005.Google ScholarCross Ref
- M.J. Dunham, H. Badrane, T. Ferea, J. Adams, P.O. Brown, F. Rosenzweig, and D. Botstein, "Characteristic Genome Rearrangements in Experimental Evolution of Saccharomyces Cerevisiae," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 16144-16149, 2002.Google ScholarCross Ref
- J.D. Lieb, L. Xiaole, D. Botstein, and P.O. Brown, "Promoter-Specific Binding of Rap1 Revealed by Genome-Wide Maps of Protein-DNA Association," Nature Genetics, vol. 28, pp. 327-334, 2001.Google ScholarCross Ref
- C.T. Harbison et al., "Transcriptional Regulatory Code of a Eukaryotic Genome," Nature, vol. 431, pp. 99-104, Sept. 2004.Google ScholarCross Ref
- N.J. Krogan et al., "High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes," Molecular Cell, vol. 13, pp. 225-239, 2004.Google ScholarCross Ref
- I. Xenarios, L. Salwinski, X.J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, "DIP, the Database of Interacting Proteins: A Research Tool for Studying Cellular Networks of Protein Interactions," Nucleic Acids Research, vol. 30, no. 1, pp. 303-305, 2002.Google ScholarCross Ref
- L.J. Lu, Y. Xia, A. Paccanaro, H. Yu, and M. Gerstein, "Assessing the Limits of Genomic Data Integration for Predicting Protein Networks," Genome Research, vol. 15, no. 7, pp. 945-953, 2005.Google ScholarCross Ref
- H. Yu, N.M. Luscombe, H.X. Lu, X. Zhu, Y. Xia, J.-D.J. Han, N. Bertin, S. Chung, M. Vidal, and M. Gerstein, "Annotation Transfer between Genomes: Protein-Protein Interologs and Protein-dna Regulogs," Genome Research, vol. 14, no. 6, pp. 1107-1118, 2004.Google ScholarCross Ref
- C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machine," http://www.csie.ntu.edu.tw/cjlin/libsvm, 2008.Google Scholar
- R.P.W. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, and D.M.J. Tax, "PRTools, A Matlab Toolbox for Pattern Recognition," http://prtools.org, 2009.Google Scholar
- C.-W. Hsu, C.-C. Chang, and L. Chih-Jen, "A Practical Guide to Support Vector Classification," technical report, Dept. of Computer Science and Information Eng., Nat'l Taiwan Univ., 2003.Google Scholar
- D.P. Lewis, T. Jebara, and W.S. Noble, "Support Vector Machine Learning from Heterogeneous Data: An Empirical Analysis Using Protein Sequence and Structure," Bioinformatics, vol. 22, pp. 2753- 2760, 2006. Google ScholarDigital Library
- S.L. Wong et al., "Combining Biological Networks to Predict Genetic Interactions," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 15682-15687, 2004.Google ScholarCross Ref
- W.-C. Kao, K.-M. Chung, C.-L. Sun, and C.-J. Lin, "Decomposition Methods for Linear Support Vector Machines," Neural Computation , vol. 16, no. 8, pp. 1689-1704, 2004. Google ScholarDigital Library
- T. Joachims, "A Support Vector Method for Multivariate Performance Measures," Proc. 22nd Int'l Conf. Machine Learning (ICML '05), pp. 377-384, 2005. Google ScholarDigital Library
Index Terms
- Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction
Recommendations
An Endosymbiotic Evolutionary Algorithm for Optimization
This paper proposes a new symbiotic evolutionary algorithm to solve complex optimization problems. This algorithm imitates the natural evolution process of endosymbionts, which is called endosymbiotic evolutionary algorithm. Existing symbiotic ...
Kernel methods for predicting protein--protein interactions
Motivation: Despite advances in high-throughput methods for discovering protein--protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to ...
A synergistic approach for evolutionary optimization
GECCO '08: Proceedings of the 10th annual conference companion on Genetic and evolutionary computationOne of the major causes of premature convergence in Evolutionary Algorithm (EA) is loss of population diversity, which pushes the search space to a homogeneous or a near-homogeneous configuration. In particular, this can be a more complicated issue in ...
Comments