skip to main content
article

Statistical Characterization of Protein Ensembles

Published: 01 January 2008 Publication History

Abstract

When accounting for structural fluctuations or measurement errors, a single rigid structure may not be sufficient to represent a protein. One approach to solve this problem is to represent the possible conformations as a discrete set of observed conformations, an ensemble. In this work, we follow a different richer approach, and introduce a framework for estimating probability density functions in very high dimensions, and then apply it to represent ensembles of folded proteins. This proposed approach combines techniques such as kernel density estimation, maximum likelihood, cross-validation, and bootstrapping. We present the underlying theoretical and computational framework and apply it to artificial data and protein ensembles obtained from molecular dynamics simulations. We compare the results with those obtained experimentally, illustrating the potential and advantages of this representation.

References

[1]
K. Lindorff-Larsen, R.B. Best, M.A. DePristo, C.M. Dobson, and M. Vendruscolo, "Simultaneous Determination of Protein Structure and Dynamics," Nature, vol. 433, pp. 128-132, 2005.
[2]
N. Furnham, T.L. Blundell, M.A. DePristo, and T.C. Terwilliger, "Correspondence: Is One Solution Good Enough," Nature Structural and Molecular Biology, vol. 13, pp. 184-185, Mar. 2006.
[3]
A.Y. Grosberg and A.R. Khoklov, Statistical Physics of Macromolecules . AIP Press, 1994.
[4]
J.E. Kohn, I.S. Millett, J. Jacob, B. Zagrovic, T.M. Dillon, N. Cingel, R.S. Dothager, S. Seifert, P. Thiyagarajan, T.R. Sosnick, M.Z. Hasan, V.S. Pande, I. Ruczinski, S. Doniach, and K.W. Plaxco, "Random-Coil Behavior and the Dimensions of Chemically Unfolded Proteins," Proc. Nat'l Academy of Sciences, vol. 101, pp. 12491-12496, 2004.
[5]
W. Rieping, M. Habeck, and M. Nilges, "Inferential Structure Determination," Science, vol. 309, pp. 303-306, 2005.
[6]
B. Zagrovic, C.D. Snow, S. Khalid, M.R. Shirts, and V.S. Pande, "Native-Like Mean Structure in the Unfolded Ensemble of Small Proteins," J. Molecular Biology, vol. 323, pp. 153-164, 2002.
[7]
D. Shortle, K.T. Simons, and D. Baker, "Clustering of Low-Energy Conformations Near the Native Structures of Small Proteins," Proc. Nat'l Academy of Sciences, vol. 95, pp. 11158-11162, 1998.
[8]
P. Bradley, K.M.S. Misura, and D. Baker, "Toward High-Resolution de Novo Structure Prediction for Small Proteins," Science, vol. 309, pp. 1868-1871, 2005.
[9]
V.S. Pande, Folding@Home Distributed Computing, Stanford Univ., http://folding.stanford.edu/, 2005.
[10]
D. Baker, The Baker Laboratory, http://www.bakerlab.org/, 2003.
[11]
S.J. Teague, "Implications of Protein Flexibility for Drug Discovery," Nature Rev. Drug Discovery, vol. 2, pp. 527-541, 2003.
[12]
C. Branden and J. Tooze, Introduction to Protein Structure. Garland Publishing, 1998.
[13]
D. Rother, G. Sapiro, and V. Pande, "Statistical Characterization of Protein Ensembles," Proc. Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '05), pp. 297-298, 2005.
[14]
R.E. Neapolitan, Learning Bayesian Networks, p. 674. Pearson Prentice Hall, 2004.
[15]
M.I. Jordan, Learning in Graphical Models. Kluwer Academic Publishers, 1998.
[16]
M. Teyssier and D. Koller, "Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks," Proc. Conf. Uncertainty in Artificial Intelligence (UAI '05), pp. 584-590, 2005.
[17]
G.E. Hinton, "Training Products of Experts by Minimizing Contrastive Divergence," Neural Computation, vol. 14, pp. 1771- 1800, 2002.
[18]
G. Seroussi, personal communication, 2006.
[19]
H. Akaike, "A New Look at the Statistical Model Identification," IEEE Trans. Automatic Control, vol. 19, pp. 716-723, 1974.
[20]
K.P. Burnham and D.R. Anderson, Model Selection and Inference: A Practical Information--Theoretic Approach, p. 353. Springer, 1998.
[21]
J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific, 1989.
[22]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, p. 536. Springer, 2001.
[23]
S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, vol. 1. Prentice Hall, 1993.
[24]
P.A. Viola, "Alignment by Maximization of Mutual Information," PhD dissertation, MIT, 1995.
[25]
T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley & Sons, 1991.
[26]
B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[27]
K.V. Mardia and P.E. Jupp, Directional Statistics. John Wiley & Sons, 2000.
[28]
B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, 1993.
[29]
R.C.H. Cheng, "Bootstrap Methods in Computer Simulation Experiments," Proc. 1995 Winter Simulation Conf., pp. 171-177, 1995.
[30]
C.D. Snow, L. Qiu, D. Du, F. Gai, S.J. Hagen, and V.S. Pande, "Trp Zipper Folding Kinetics by Molecular Dynamics and Temperature--Jump Spectroscopy," Proc. Nat'l Academy of Sciences, vol. 101, pp. 4077-4082, 2004.
[31]
B. Zagrovic, C.D. Snow, M.R. Shirts, and V.S. Pande, "Simulation of Folding of a Small Alpha-Helical Protein in Atomistic Detail Using Worldwide-Distributed Computing," J. Molecular Biology, vol. 323, pp. 927-937, 2002.
[32]
C.L. Brooks, M. Karplus, and B. Montgomery Pettitt, Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics, vol. 71, p. 259. Wiley-Interscience, 1988.
[33]
H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, "The Protein Data Bank," Nucleic Acids Research, vol. 28, pp. 235-242, 2000.
[34]
J. McKnight McKnight Lab PDB Files, http://people.bu.edu/ cjmck/pdb.htm, 2005.
[35]
A. Elgammal, R. Duraiswami, and L.S. Davis, "Efficient Kernel Density Estimation Using the Fast Gauss Transform with Applications to Color Modeling and Tracking," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 11, pp. 1499- 1504, Nov. 2003.
[36]
A.G. Gray and A.W. Moore, "Nonparametric Density Estimation: Toward Computational Tractability," Proc. SIAM Int'l Conf. Data Mining, 2003.
[37]
J. Beirlant, E.J. Dudewicz, L. Györfi, and E.C. Van der Meulen, "Nonparametric Entropy Estimation: An Overview," Int'l J. Math. and Statistical Sciences, vol. 6, pp. 17-40, 1997.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 1
January 2008
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 January 2008
Published in TCBB Volume 5, Issue 1

Author Tags

  1. Bayesian networks
  2. bootstrapping
  3. cross-validation
  4. density estimation
  5. graphical models
  6. maximum likelihood
  7. protein ensembles

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 270
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media