|
ABSTRACT
Given a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the existing prediction methods, our approach proposes an encoding schema that weaves physio-chemical information in encoded vectors and a prediction framework that combines the context information with secondary structure segments. We employed Support Vector Machine (SVM) for training the CB513 and RS126 data sets, which are collections of protein secondary structure sequences, through sevenfold cross validation to uncover the structural differences of protein secondary structures. Hereafter, we apply the sliding window technique to test a set of protein sequences based on the group classification learned from the training set. Our approach achieves 77.8% segment overlap accuracy (SOV) and 75.2% three-state overall per-residue accuracy (Q3), which outperform other prediction methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. D. Bank. http://www.rcsb.org/pdb/, 2002.
|
| |
2
|
C.-C. Chang and C.-J. Lin. LIBSVM: a Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
|
| |
3
|
J. A. Cuff and G. J. Barton. Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins: Struct. Funct. Genet., 34:508--519, 1999.
|
| |
4
|
H. Drucker, D. Wu, and V. Vapnik. Support Vector Machines for Span Categorization. IEEE Trans. on Neural Networks, 10:1048--1054, 1999.
|
| |
5
|
M. O. D. (ed). Atlas of Protein Sequence and Structure. National Biomedical Research Foundation (Washington, D. C.), 5, 1972.
|
| |
6
|
D. Frishman and P. Argos. Knowledge-Based Protein Secondary Structure Assignment. Proteins, 23:566--579, 1995.
|
| |
7
|
S. Hua and Z. Sun. A. Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. Bioinformatics, 308:397--407, 2001.
|
| |
8
|
J. Garnier, D. J. Osguthorpe, and B. Robson. Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins. J. Mol Biol, 120:97--120, 1978.
|
| |
9
|
W. Kabsch and C. Sander. A Dictionary of Protein Secondary Structure. Biopolymers, 22:2577--2637, 1983.
|
| |
10
|
J. Moult and et al. Critical Assessment of Methods of Protein Structure Prediction (CASP): Round II. Proteins. supplement 1., 29(S1):2--6, 1997.
|
| |
11
|
D. Nelson and M. Cox. Lehninger Principles of Biochemistry Amino. Worth Publishers, 2000.
|
| |
12
|
|
| |
13
|
N. Qian and T. J. Sejnowski. Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. J. Mol. Biol, 202:865--884, 1988.
|
| |
14
|
H. H. Rashidi and K. L. Buehler. Bioinformatics Basics Applications in Biological Science and Medicine. CRC Press, 2000.
|
| |
15
|
F. M. Richards and C. E. Kundrot. Identification of Structural Motifs from Protein Coordinate Data: Secondary Structure and First-Level Supersecondary Structure. Proteins, 3:71--84, 1988.
|
| |
16
|
B. Rost and C. Sander. Prediction of Protein Secondary Structure at Better Than 70% Accuracy. J. Mol Biol, 232:584--599, 1993.
|
| |
17
|
B. Rost, C. Sander, and R. Schneider. Redefining the Goals of Protein Secondary Structure Prediction. J. Mol Biol, 235:13--26, 1994.
|
| |
18
|
|
| |
19
|
M. J. Zvelebil, G. J. Barton, W. R. Taylor, and et al. Prediction of Protein Secondary Structure and Active Sites Using the Alignment of Homologous Sequences. J. Mol Biol, 195:957--961, 1987.
|
| |
20
|
D. Zwillinger, S. G. Krantz, and K. H. Rosen, editors. Standard Mathematical Tables and Formulae (30th edition). CRC Press, 1996.
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|