Article

101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem

Authors:
Giuseppe Lancia

Celera Genomics, Rockville, MD and D.E.I., University of Padova

Celera Genomics, Rockville, MD and D.E.I., University of Padova
View Profile

,
Robert Carr

Sandia National Labs, Albuquerque, NM

Sandia National Labs, Albuquerque, NM
View Profile

,
Brian Walenz

Celera Genomics, Rockville, MD

Celera Genomics, Rockville, MD
View Profile

,
Sorin Istrail

Celera Genomics, Rockville, MD

Celera Genomics, Rockville, MD
View Profile

RECOMB '01: Proceedings of the fifth annual international conference on Computational biologyApril 2001Pages 193–202https://doi.org/10.1145/369133.369199

Published:22 April 2001Publication History

RECOMB '01: Proceedings of the fifth annual international conference on Computational biology

Pages 193–202

ABSTRACT

Structure comparison is a fundamental problem for structural genomics. A variety of structure comparison methods were proposed and several protein structure classification servers e.g., SCOP, DALI, CATH, were designed based on them, and are extensively used in practice. This area of research continues to be very active, being energized bi-annually by the CASP folding competitions, but despite the extraordinary international research effort devoted to it, progress is slow. A fundamental dimension of this bottleneck is the absence of rigorous algorithmic methods. A recent excellent survey on structure comparison by Taylor et.al. [23] records the state of the art of the area: In structure comparison, we do not even have an algorithm that guarantees an optimal answer for pairs of structures …

In this paper we provide the first rigorous algorithm for structure comparison. Our method is based on developing an effective integer linear programming (IP) formulation of protein structure contact maps overlap (CMO), and a branch-and-cut strategy that employs lower-bounding heuristics at the branch nodes. Our algorithms identified a gallery of optimal and near-optimal structure alignments for pairs of proteins from the Protein Data Bank with up to 80 amino acids and about 150 contacts each — problems of instance size of about 300. Although these sizes also reflect our current limitations, these are the first provable optimal and near-optimal algorithms in the literature for a measure of structure similarity which sees extensive practical use. At the heart of our success in finding optimal alignments is a reduction of the CMO optimization to the maximum independent set (MIS) problem on special graphs. For CMO instances of size 300, the corresponding MIS graph instance contains about 10,000 nodes. While our algorithms are able to solve to optimality MIS problem of these sizes, the known optimal algorithms for the MIS on general graphs can at present only solve instances with up to a few hundred nodes. This is the first effective use of IP methods in protein structure comparison; the biomolecular structure literature contains only one other effective IP method devoted to RNA comparison, due to Lenhof et.al. [18].

The hybrid heuristic approach that worked well for providing lower bounds in the branch and cut algorithm was tried on large proteins in a test set suggested by Jeffrey Skolnick. It involved 33 proteins classified into four families: Flavodoxin-like fold CheY-related, Plastocyanin, TIM Barrel, and Ferratin. Out of the set of all 528 pairwise structure alignments, we have validated the clustering with a 98.7% accuracy (1.3% false negatives and 0% false positives).

References

1.E. Balas and C. S. Yu, Finding a maximum clique in an arbitrary graph, SIAM J. on Comp., 15(4) :1054-1068, 1986. Google ScholarDigital Library
2.H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank, Nucleic Acids Research, 28 pp. 235-242, 2000.Google ScholarCross Ref
3.W. J. Cook, W. H. Cunningham, W. R. Pulleyblank and A. Schrijver, Combinatorial Optimization, John Wiley and Sons, New York, 1998. Google ScholarDigital Library
4.P. Crescenzi and V. Kann, A compendium of NP optimization problems, http ://www. nada. kth. se/~viggo, the web.Google Scholar
5.M. Grotschel, L. Lov~sz and A. Schrijver, "The Ellipsoid Method and its Consequences in Combinatorial Optimization", Combinatorica 1 (1981), 169-197.Google ScholarCross Ref
6.A. Godzik, The structural alignment between two proteins: Is there a unique answer ?, Protein Science, 5:1325-1338, 1996.Google ScholarCross Ref
7.A. Godzik, J. Sklonick and A. Kolinski, A topology fingerprint approach to inverse protein folding problem, J. Mol. Bio1.,227:227-238, 1992.Google Scholar
8.A. Godzik and J. Skolnick, Flexible algorithm for direct multiple alignment of protein structures and sequences, CABIOS, 10, (6) 587-596, 1994.Google Scholar
9.Garey and Johnson, Computers and intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979. Google ScholarDigital Library
10.D. Goldman, S. Istrail and C. Papadimitriou, Algorithmic Aspects of Protein Structure Similarity, Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, 512-522, 1999. Google ScholarDigital Library
11.D. Goldman, PhD. Thesis, Dept. of Computer Science, U C Berkeley, 2000.Google Scholar
12.A. Lucas, K. Dill and S. Istrail, Contact maps and the computational statistical mechanics aspects of protein folding (in preparation).Google Scholar
13.R. B. Hayward, Wealky Triangulated Graphs, J. of Comb. Theory, Series B, (39)200-209, 1985.Google Scholar
14.R.B. Hayward, C. Hoang and F. Maffray, Optimizing Wealky Triangulated Graphs, Graphs and Combinatorics, 1987.Google Scholar
15.L. Holm and C. Sander, 3-D lookup: fast protein structure searches at 90% reliability, Proceedings of the ISMB 1995, p. 179-187, AAAI., 1995.Google Scholar
16.D. S. Johnson and M. A. Trick eds, Cliques, Coloring, and Satisfiability, Dimacs Series in Discrete Mathematics and Theoretical Computer Science, the American Mathematical Society, 1996. Google ScholarDigital Library
17.Kabash-W., A solution for the best rotation to relate two sets of vectors, Acta Cryst. A32, 922-923, 1978.Google Scholar
18.H. P. Lenhof, K. Reinert, M. Vingron, A Polyhedral Approach to RNA Sequence Structure Alignment, J. Comp. Biol., 5(3):517-530, 1998.Google ScholarCross Ref
19.A. Lesk, 11th Lipari International Summer School in Computational Biology, 1999.Google Scholar
20.G. L. Nemhauser and L. Wolsey, Integer and Combinatorial Optimization, J. Wiley and Sons, 1988. Google ScholarDigital Library
21.G. L. Nemhauser and L. E. Trotter, Vertex packings: Structural properties and algorithms, Mathematical Programming, 8:232-248, 1975.Google ScholarDigital Library
22.A. Raghunathan, Algorithms for Weakly Triangulated Graphs, UC. Berkeley, Tech. Rep. CSD-89-503, 1989. Google ScholarDigital Library
23.I. Eidhammer and I. Jonassen and W. R. Taylor, Structure Comparison and Structure Prediction, to appear J. Comp. Biol., x(x), 2000.Google Scholar
24.D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading: Addison-Wesley, 1989. Google ScholarDigital Library
25.J. H. Holland, Adaptation in Natural and Artificial Systems, Cambridge, MA: MIT Press, 1992. Google ScholarDigital Library

Index Terms

Recommendations

sc-PDB

Background: The sc-PDB database is an annotated archive of druggable binding sites extracted from the Protein Data Bank. It contains all-atoms coordinates for 8166 protein–ligand complexes, chosen for their geometrical and physico-chemical properties. ...
Read More
Pre-calculated protein structure alignments at the RCSB PDB website

Summary: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D ...
Read More
PSAC-PDB: Analysis and classification of protein structures
Abstract
This paper presents a novel framework, called PSAC-PDB, for analyzing and classifying protein structures from the Protein Data Bank (PDB). PSAC-PDB first finds, analyze and identifies protein structures in PDB that are similar to a protein ...
Highlights
- A framework is developed, called PSAC-PDB, for the analysis and classification of protein structures in PDB.
- Frequent sequential amino acids can be used for efficient classification of protein structures instead of providing the whole ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RECOMB '01: Proceedings of the fifth annual international conference on Computational biology
April 2001
316 pages
ISBN:1581133537
DOI:10.1145/369133
Chairman:
Thomas Lengauer
German National Research Center for Information Technology, Germany
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
RECOMB '01 Paper Acceptance Rate35of128submissions,27%Overall Acceptance Rate148of538submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 558
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem

RECOMB '01: Proceedings of the fifth annual international conference on Computational biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

sc-PDB

Pre-calculated protein structure alignments at the RCSB PDB website

PSAC-PDB: Analysis and classification of protein structures