|
ABSTRACT
We present a protein fold recognition method that uses a comprehensive statistical interpretation of structural Hidden Markov Models (HMMs). The structure/fold recognition is done by summing the probabilities of all sequence-to-structure alignments Conventionally, Boltzmann statistics dictate that the optimal alignment can give an estimate of the lowest free energy of the sequence conformation imposed by the structural model. The alignment is optimized for a scoring function that is interpreted as a free energy of an amino acid in a structural environment. Near-optimal alignments are ignored, regardless of how likely they might be compared to the optimal alignment. Here we investigate an alternative view. A structure model can be seen as a statistical representation of an ensemble of similar structures. The optimal alignment is always the most probable, but sub-optimal alignments may have comparable probabilities. These sub-optimal alignments can be interpreted as optimal alignments to the “other” structures from the ensemble or optimal alignments under minor fluctuations in the scoring function. Summing probabilities for all alignments gives an estimate of sequence-model compatibility. We have built a set of structural HMMs for 188 protein structures, and have compared two methods for identifying the structure compatible with a sequence: by the optimal alignment probability and by the total probability. Fold recognition by total probability was 40% more accurate than fold recognition by the optimal alignment probability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
BERNSTEIN, F. C., KOETZLE, T. F., WILLIAMS: G. J. B.~ MEYER, E. F., BRIGIS~ M. D., ROI)Gt;;RS: J. R., KENNARD, O., $HIMANOUCHI, T., AND TASUMi, M. The protein data bastk: a computer-ba.qed archival file for maeromoleculax structures. J. Mol. B:ol. t12 (1977), 535-542. Brookhaven Protein Data Bank release 80.
|
| |
2
|
BOWIE, J. U., LUTHY, R., AND Eml~sara(~, D. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253 (1991), 164-170.
|
| |
3
|
BRYANT, $. H., AND LAWRIgNCE, C. E. An empirical energy function for threading protein sequence through the folding motif. Proteins: Structure. l~mction and Genetics 16 (1993), 92-112.
|
| |
4
|
GODZII4, A., SKOLNICK, J., AND KOLINSKI: A. A topoiogy fingerprint approach to the inverse folding problem. J. Idol. B, ol. 227 (1992), 227-238.
|
| |
5
|
JERNiGAN, R. L.: AND BAHAR, }. Structure-derived potentials and protein simulations. Current Op,mon in Structural Bzology 6 (1996)~ 195-209.
|
| |
6
|
KABSCH, W., AND SANDER, C. Dictionary of protein secondary structure: Pattern recognition of hydrogenbonded and geometrical features. B~opolymers 22 (1983), 2577-2037.
|
| |
7
|
LAT~aOP, R. H., Roo~as JR., R. G.,,. S~t{TH, T. F., AND WHITE, J. V. A bayes-optimal seq~ience-structure theory that unifies protein sequence-structure recognition and alignment. Bulletin of Mathematw~t Biology. 60 (1998), 1-33.
|
| |
8
|
LATHROP: R. H., ROGERS JR., R. G., BI~NKOWSKA, J. R., BRYANT, B. K. M., BUTUROVI~, L. J., GAI- TATZES, C. NAMBUDRIPAD, R., WHHh~ J. V., AND SMITH, T. F. Analys~s and Al~or~thms for Protein Sequence.Structure Al, gnment. S. Salzberg, D. Searls and S. Kasif, Elsevier Press, Amsterdam, Netherlands, 1998, pp. 227-283.
|
| |
9
|
LaMeR, C., P~OOMAN, M. J., At~O Woo^t<. S. Protein structure prediction by threading meth0&s: evaluation of current techniques. Proteins 23 (1995), 337-355.
|
| |
10
|
LRWTT, M. Competitive assessment of pr(~tein fold recognition and alignment accuracy. Pt'ott.ms: Struc. ture, Functton and Genetics Suppl. 1 (1997), 92-104.
|
| |
11
|
MIvaz^w^, S., ANO JERNIOAN, R. L. Resl~lue-residue potentials with a favorable contact pair term and unfavorable high packing density term, for simulation and threading. J. Mol. Bwl. 256 (1996), 623-644.
|
| |
12
|
MURztN, A., BReNNtlR, S. E., HUBBARD, T., AND CHOTHIA, C. SCOP: a structural classification of proteins database for the investigation of the sequences and structures. J. Mol Biol. ~2d7 (1995), 536-540.
|
| |
13
|
MURmN, A. G. Structure classification-based assessment of CASP3 predictions for the fold recogmtion targets. Protezn~: Structure, Function and Genet,cs Suppl. 3 (1999), 88-t03.
|
| |
14
|
PARK, J., KAaPLUS, K., BARRETT, C., HUGHE~, R., HAUSSLRR~ D.: HUBBARD: T., AND CHOTHIA, C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mot. Biol. 28d (December 1998), 1201- 1210.
|
| |
15
|
R, Aatt~a, L. a. A tutorial on hidden markov models and selected applications in spech recognition. Proceed. mrs iEEE 77 (1989), 257-286.
|
| |
16
|
ROST, B., SCHNEIDER: R., AND SANDER, C. Protein fold recognition by prediction-based threading. J. Mol. Bwl. 270 (1997), 471-480.
|
| |
17
|
RUSSELL, R. B., COPLEY, 1~. R., AND BARTON, G. J. Protein fold recognition by mapping predicted secondary structures. J. Mol. Bwl. 259 (1996), 349-365.
|
| |
18
|
SIPPL, M. J. Knowledge-based potentials for proteiv~s. Current Opm,on in Structural Biology 5 (1995), 229- 235.
|
| |
19
|
SKOLNIGK, J,, JAR OSZEWSKI, L., KOL~NSK{, A., AND GODZIK, A. Derivation and testing of pair potentials for protein folding, when is the quasichemical approximation correct? Protein Science 6 (1907), 676-688.
|
| |
20
|
SM~VS, T. r., ASD WaWraMar~, M. $. Identification of common molecular subsequences. J. Mol. Bwl 147 (1980), 195-197.
|
| |
21
|
T., J. D. GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Bsol. 287:4 (April 1999), 797-815.
|
| |
22
|
TAYLOR, W. P~. Multiple .sequence threading: An analysis of Mignment quality and stability, or. Mot Biol. ~69 (1997), 902-943.
|
| |
23
|
THIELE, R., ZIMMBR, R., AND LI~NGAUEIt, T. Protein threading by recursive dynamic programming. J. Mol Bwl. 290 (July 1999), 757-779.
|
| |
24
|
VITERSE, A. J. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE 2qrans. Information Theory IT. 13 (April 1967), 260-269.
|
| |
25
|
WHITE, J. V. Bayeszan analya:s of t:me seines and dynam, c models. Marcel Dekker, New York, NY USA, 1988, pp. 255-283.
|
| |
26
|
WHrTE, J. V., STULTZ, C. M., AND SMITH, T. F. Protein classification by stochastic modeling and optimal filtering of amino ~id sequences. Balkan of Mathe. rear,cat B:osciences 119 (1994), 35-75.
|
| |
27
|
Yu, L., WHITE, J. V., AND SMITH, T. F. A homology identification method that combines sequence and structure information. Protein Science ? (1998), 2499- 2510.
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|