article

Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

Authors:
Marc Hulsman

Delft University of Technology, Delft

Delft University of Technology, Delft
View Profile

,
Marcel J. T. Reinders

Delft University of Technology, Delft

Delft University of Technology, Delft
View Profile

,
Dick de Ridder

Delft University of Technology, Delft

Delft University of Technology, Delft
View Profile

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 6 Issue 3pp 427–437https://doi.org/10.1109/TCBB.2008.137

Published:01 July 2009Publication History

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.

References

B. Schölkopf, K. Tsuda, and J.-P. Vert, Kernel Methods in Computational Biology. MIT Press, 2004.Google Scholar
J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004. Google ScholarDigital Library
A. Ben-Hur, C.S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch, "Support Vector Machines and Kernels for Computational Biology," PLoS Computational Biology, vol. 4, no. 10, Oct. 2008.Google Scholar
C. Cortes and V. Vapnik, "Support Vector Networks," Machine Learning, vol. 20, pp. 273-297, 1995. Google ScholarDigital Library
Y. Ho et al., "Systematic Identification of Protein Complexes in Saccharomyces Cerevisiae by Mass Spectrometry," Nature, vol. 415, pp. 180-183, Jan. 2002.Google ScholarCross Ref
A.-C. Gavin et al., "Functional Organization of the Yeast Proteome by Systematic Analysis of Protein Complexes," Nature, vol. 415, pp. 141-147, 2002.Google ScholarCross Ref
G.D. Bader and C.W.V. Hogue, "Analyzing Yeast Protein-Protein Interaction Data Obtained from Different Sources," Nature Biotechnology, vol. 20, pp. 991-997, 2002.Google ScholarCross Ref
C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Fields, and P. Bork, "Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions," Nature, vol. 417, pp. 399-403, 2002.Google ScholarCross Ref
R. Jansen and M. Gerstein, "Analyzing Protein Function on a Genomic Scale: The Importance of Gold-Standard Positives and Negatives for Network Prediction," Current Opinion in Microbiology , vol. 7, pp. 535-545, 2004.Google ScholarCross Ref
G.D. Bader and C.W.V. Hogue, "An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks," BMC Bioinformatics, vol. 4, 2003.Google Scholar
R. Jansen, Y. Haiyuan, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, and M. Gerstein, "A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data," Science, vol. 302, pp. 449-453, 2003.Google ScholarCross Ref
L.V. Zhang, S.L. Wong, O.D. King, and F.P. Roth, "Predicting Co-Complexed Protein Pairs Using Genomic and Proteomic Data Integration," BMC Bioinformatics, vol. 5, i38-i46, 2004.Google ScholarCross Ref
R.J.P. van Berlo, L.F.A. Wessels, D. de Ridder, and M.J.T. Reinders, "Protein Complex Prediction Using an Integrative Bioinformatics Approach," J. Bioinformatics and Computational Biology, vol. 4, pp. 839-861, 2007.Google ScholarCross Ref
Y. Qi, Z. Bar-Joseph, and J. Klein-Seetharaman, "Evaluation of Different Biological Data and Computational Classification Methods for Use in Protein Interaction Prediction," Proteins: Structure, Function, and Bioinformatics, vol. 63, pp. 490-500, 2006.Google ScholarCross Ref
G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan, "Learning the Kernel Matrix with Semidefinite Programming," Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 323- 330, 2002. Google ScholarDigital Library
G.R.G. Lanckriet, T. de Bie, N. Cristianini, M.I. Jordan, and W.S. Noble, "A Statistical Framework for Genomic Data Fusion," Bioinformatics, vol. 20, pp. 2626-2635, 2004. Google ScholarDigital Library
S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf, "Large Scale Multiple Kernel Learning," J. Machine Learning Research, vol. 7, pp. 1531-1565, 2006. Google ScholarDigital Library
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, "Choosing Multiple Parameters for Support Vector Machines," Machine Learning, vol. 46, pp. 131-159, 2002. Google ScholarDigital Library
O. Bousquet and D. Herrmann, "On the Complexity of Learning the Kernel Matrix," Advances in Neural Information Processing Systems, S. Becker, S. Thrun, and K. Obermayer, eds., vol. 15. pp. 415-422, MIT Press, 2003.Google Scholar
N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, "On Kernel Target Alignment," Advances in Neural Information Processing Systems, vol. 14, pp. 367-374, MIT Press, 2002.Google Scholar
P. Pavlidis, J. Weston, C. Jinsong, and W.S. Noble, "Learning Gene Functional Classifications from Multiple Data Types," J. Computational Biology, vol. 9, pp. 401-412, 2002.Google ScholarCross Ref
A. Ben-Hur and W.S. Noble, "Kernel Methods for Predicting Protein-Protein Interactions," Bioinformatics, vol. 21, no. 1, pp. i38- i46, 2005. Google ScholarDigital Library
S. Martin, D. Roe, and J.-L. Faulon, "Predicting Protein-Protein Interactions Using Signature Products," Bioinformatics, vol. 21, pp. 218-226, 2005. Google ScholarDigital Library
T. Phienthrakul and B. Kijsirikul, "Evolutionary Strategies for Multi-Scale Radial Basis Function Kernels in Support Vector Machines," Proc. 2005 Conf. Genetic and Evolutionary Computation, pp. 905-911, 2005. Google ScholarDigital Library
F. Friedrichs and C. Igel, "Evolutionary Tuning of Multiple SVM Parameters," Proc. European Symp. Artificial Neural Networks (ESANN), pp. 519-524, 2004.Google Scholar
T. De Bie, C.-L. Tranchevent, L.M.M. van Oeffelen, and Y. Moreau, "Kernel-Based Data Fusion for Gene Prioritization," Bioinformatics, pp. i125-i132, 2007. Google ScholarDigital Library
N. Hansen, S.D. Müller, and P. Koumoutsakos, "Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES)," Evolutionary Computation , vol. 11, pp. 1-18, 2003. Google ScholarDigital Library
A. Auger and N. Hansen, "A Restart CMA Evolution Strategy with Increasing Population Size," Proc. IEEE Congress Evolutionary Computation (CEC '05), vol. 2, pp. 1769-1776, 2005.Google Scholar
F. Bach, G.R.G. Lanckriet, and M.I. Jordan, "Multiple Kernel Learning, Conic Duality, and the SMO Algorithm," Proc. 21st Int'l Conf. Machine Learning, 2004. Google ScholarDigital Library
F. Bach, R. Thibaux, and M.I. Jordan, "Computing Regularization Paths for Learning Multiple Kernels," J. Machine Learning Research, vol. 5, pp. 1391-1415, 2004.Google Scholar
H.W. Mewes, K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, and D. Frishman, "MIPS: A Database for Genomes and Protein Sequences," Nucleic Acids Research, vol. 28, pp. 37-40, 2000.Google ScholarCross Ref
A. Ben-Hur and W.S. Noble, "Choosing Negative Examples for the Prediction of Protein-Protein Interactions," BMC Bioinformatics , vol. 7, no. 1, 2006.Google Scholar
J. Gollub et al., "The Stanford Microarray Database: Data Access and Quality Assessment Tools," Nucleic Acids Research, vol. 31, no. 1, pp. 94-96, 2003.Google ScholarCross Ref
T.R. Hughes et al., "Functional Discovery via a Compendium of Expression Profiles," Cell, vol. 102, pp. 109-126, 2000.Google ScholarCross Ref
S.L. Tai, V.M. Boer, P. Daran-Lapujade, M.C. Walsh, J.H. de Winde, J.-M. Daran, and J.T. Pronk, "Two-Dimensional Transcriptome Analysis in Chemostat Cultures: Combinatorial Effects of Oxygen Availability and Macronutrient Limitation in Saccharomyces Cerevisiae," J. Biological Chemistry, vol. 280, no. 1, pp. 437- 447, 2005.Google ScholarCross Ref
M.J. Dunham, H. Badrane, T. Ferea, J. Adams, P.O. Brown, F. Rosenzweig, and D. Botstein, "Characteristic Genome Rearrangements in Experimental Evolution of Saccharomyces Cerevisiae," Proc. Nat'l Academy of Sciences USA, vol. 99, pp. 16144-16149, 2002.Google ScholarCross Ref
J.D. Lieb, L. Xiaole, D. Botstein, and P.O. Brown, "Promoter-Specific Binding of Rap1 Revealed by Genome-Wide Maps of Protein-DNA Association," Nature Genetics, vol. 28, pp. 327-334, 2001.Google ScholarCross Ref
C.T. Harbison et al., "Transcriptional Regulatory Code of a Eukaryotic Genome," Nature, vol. 431, pp. 99-104, Sept. 2004.Google ScholarCross Ref
N.J. Krogan et al., "High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes," Molecular Cell, vol. 13, pp. 225-239, 2004.Google ScholarCross Ref
I. Xenarios, L. Salwinski, X.J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg, "DIP, the Database of Interacting Proteins: A Research Tool for Studying Cellular Networks of Protein Interactions," Nucleic Acids Research, vol. 30, no. 1, pp. 303-305, 2002.Google ScholarCross Ref
L.J. Lu, Y. Xia, A. Paccanaro, H. Yu, and M. Gerstein, "Assessing the Limits of Genomic Data Integration for Predicting Protein Networks," Genome Research, vol. 15, no. 7, pp. 945-953, 2005.Google ScholarCross Ref
H. Yu, N.M. Luscombe, H.X. Lu, X. Zhu, Y. Xia, J.-D.J. Han, N. Bertin, S. Chung, M. Vidal, and M. Gerstein, "Annotation Transfer between Genomes: Protein-Protein Interologs and Protein-dna Regulogs," Genome Research, vol. 14, no. 6, pp. 1107-1118, 2004.Google ScholarCross Ref
C.-C. Chang and C.-J. Lin, "LIBSVM: A Library for Support Vector Machine," http://www.csie.ntu.edu.tw/cjlin/libsvm, 2008.Google Scholar
R.P.W. Duin, P. Juszczak, P. Paclik, E. Pekalska, D. de Ridder, and D.M.J. Tax, "PRTools, A Matlab Toolbox for Pattern Recognition," http://prtools.org, 2009.Google Scholar
C.-W. Hsu, C.-C. Chang, and L. Chih-Jen, "A Practical Guide to Support Vector Classification," technical report, Dept. of Computer Science and Information Eng., Nat'l Taiwan Univ., 2003.Google Scholar
D.P. Lewis, T. Jebara, and W.S. Noble, "Support Vector Machine Learning from Heterogeneous Data: An Empirical Analysis Using Protein Sequence and Structure," Bioinformatics, vol. 22, pp. 2753- 2760, 2006. Google ScholarDigital Library
S.L. Wong et al., "Combining Biological Networks to Predict Genetic Interactions," Proc. Nat'l Academy of Sciences USA, vol. 101, pp. 15682-15687, 2004.Google ScholarCross Ref
W.-C. Kao, K.-M. Chung, C.-L. Sun, and C.-J. Lin, "Decomposition Methods for Linear Support Vector Machines," Neural Computation , vol. 16, no. 8, pp. 1689-1704, 2004. Google ScholarDigital Library
T. Joachims, "A Support Vector Method for Multivariate Performance Measures," Proc. 22nd Int'l Conf. Machine Learning (ICML '05), pp. 377-384, 2005. Google ScholarDigital Library

Index Terms

Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction
1. Applied computing
  1. Life and medical sciences
2. Computing methodologies

Recommendations

An Endosymbiotic Evolutionary Algorithm for Optimization

This paper proposes a new symbiotic evolutionary algorithm to solve complex optimization problems. This algorithm imitates the natural evolution process of endosymbionts, which is called endosymbiotic evolutionary algorithm. Existing symbiotic ...
Read More
Kernel methods for predicting protein--protein interactions

Motivation: Despite advances in high-throughput methods for discovering protein--protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to ...
Read More
A synergistic approach for evolutionary optimization
GECCO '08: Proceedings of the 10th annual conference companion on Genetic and evolutionary computation

One of the major causes of premature convergence in Evolutionary Algorithm (EA) is loss of population diversity, which pushes the search space to a homogeneous or a near-homogeneous configuration. In particular, this can be a more complicated issue in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 6, Issue 3
July 2009
159 pages
ISSN:1545-5963
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
IEEE Computer Society Press
Washington, DC, United States
Publication History
- Published: 1 July 2009
Published in tcbb Volume 6, Issue 3
Author Tags
Classifier design and evaluation
biology and genetics
evolutionary computing and genetic algorithms.
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 195
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

References

Cited By

Index Terms

Recommendations

An Endosymbiotic Evolutionary Algorithm for Optimization

Kernel methods for predicting protein--protein interactions

A synergistic approach for evolutionary optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Evolutionary Optimization of Kernel Weights Improves Protein Complex Comembership Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Abstract

References

Cited By

Index Terms

Recommendations

An Endosymbiotic Evolutionary Algorithm for Optimization

Kernel methods for predicting protein--protein interactions

A synergistic approach for evolutionary optimization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media