skip to main content
10.1145/2382936.2382991acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Predicting protective bacterial antigens using random forest classifiers

Published:07 October 2012Publication History

ABSTRACT

Identifying protective antigens from bacterial pathogens is important for developing vaccines. Most computational methods for predicting protein antigenicity rely on sequence similarity between a query protein sequence and at least one known antigen. Such methods limit our ability to predict novel antigens (i.e., antigens that are not homologous to any known antigen). Therefore, there is an urgent need for alignment-free computational methods for reliable prediction of protective antigens.

We evaluated the discriminative power of four different amino acid composition derived feature representations using three classification methods (Logistic Regression, Support Vector Machine, and Random Forest) on a cross validation data set of 193 protective bacterial antigens and 193 non-antigenic bacterial proteins. Our results show that, with all four data representations, Random Forest classifiers consistently outperform other classifiers. We compared HRF50, one of the best performing Random Forest classifiers with VaxiJen and SignalP on independent test sets derived from the Chlamydia trachomatis and Bartonella proteomes. Our results show that our HRF50 predictor outperforms VaxiJen and is competitive with SignalP and ANTIGENpro in predicting protective antigens. We further showed that when we combine SignalP with HRF50, the resulting method, which we call BacGen, yields performance that is comparable to or better than that of ANTIGENpro in predicting antigens in bacterial sequences. We conclude that amino acid sequence composition derived features can be effectively used to design alignment-free methods for predicting protein antigenicity using Random Forest classifiers. BacGen is available as an online server at:http://ailab.cs.iastate.edu/bacgen/.

References

  1. S. Bambini, R. Rappuoli, The use of genomics in microbial vaccine development, Drug Discovery Today 14 (5--6) (2009) 252--260.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Rappuoli, Reverse vaccinology, Current Opinion in Microbiology 3 (5) (2000) 445--450.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Adu-Bobie, B. Capecchi, D. Serruto, R. Rappuoli, M. Pizza, Two years into reverse vaccinology, Vaccine 21 (7--8) (2003) 605--610.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Rappuoli, A. Aderem, A 2020 vision for vaccines against HIV, tuberculosis and malaria, Nature 473 (7348) (2011) 463--469.Google ScholarGoogle Scholar
  5. A. Sette, R. Rappuoli, Reverse vaccinology: developing vaccines in the era of genomics, Immunity 33 (4) (2010) 530--541.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Jones, Reverse vaccinology on the cusp, Nature Reviews Drug Discovery 11 (3) (2012) 175--176.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Flower, I. Macdonald, K. Ramakrishnan, M. Davies, I. Doytchinova, Computer aided selection of candidate vaccine antigens, Immunome Research 6 (2010) 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  8. I. Doytchinova, D. Flower, VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines, BMC Bioinformatics 8 (1) (2007) 4.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Magnan, M. Zeller, M. Kayala, A. Vigil, A. Randall, P. Felgner, P. Baldi, High-throughput prediction of protein antigenicity using protein microarray data, Bioinformatics 26 (23) (2010) 2936--2943. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Dyrløv Bendtsen, H. Nielsen, G. von Heijne, S. Brunak, Improved prediction of signal peptides: Signalp 3.0, Journal of Molecular Biology 340 (4) (2004) 783--795.Google ScholarGoogle Scholar
  11. B. Yang, S. Sayers, Z. Xiang, Y. He, Protegen: a web-based protective antigen database and analysis system, Nucleic Acids Research 39 (suppl 1) (2011) D1073--D1078.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Shi, S. Zhang, Y. Liang, Q. Pan, Prediction of protein subcellular localizations using moment descriptors and support vector machine, Pattern Recognition in Bioinformatics (2006) 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Altschul, T. Madden, A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D. Lipman, Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic Acids Research 25 (17) (1997) 3389--3402.Google ScholarGoogle ScholarCross RefCross Ref
  14. O. Finco, E. Frigimelica, F. Buricchi, R. Petracca, G. Galli, E. Faenzi, E. Meoni, A. Bonci, M. Agnusdei, F. Nardelli, et al., Approach to discover t-and b-cell antigens of intracellular pathogens applied to the design of chlamydia trachomatis vaccines, Proceedings of the National Academy of Sciences 108 (24) (2011) 9969--9974.Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Follmann, A. Olsen, K. Jensen, P. Hansen, P. Andersen, M. Theisen, Antigenic profiling of a chlamydia trachomatis gene-expression library, Journal of Infectious Diseases 197 (6) (2008) 897--905.Google ScholarGoogle ScholarCross RefCross Ref
  16. R. Coler, A. Bhatia, J. Maisonneuve, P. Probst, B. Barth, P. Ovendale, H. Fang, M. Alderson, Y. Lobet, J. Cohen, et al., Identification and characterization of novel recombinant vaccine antigens for immunization against genital chlamydia trachomatis, FEMS Immunology & Medical Microbiology 55 (2) (2009) 258--270.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Molina, S. Pal, M. Kayala, A. Teng, P. Kim, P. Baldi, P. Felgner, X. Liang, L. De la Maza, Identification of immunodominant antigens of chlamydia trachomatis using proteome microarrays, Vaccine 28 (17) (2010) 3014--3024.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Wang, L. Chen, F. Chen, X. Zhang, Y. Zhang, J. Baseman, S. Perdue, I. Yeh, R. Shain, M. Holland, et al., A chlamydial type iii-secreted effector protein (tarp) is predominantly recognized by antibodies from humans infected with chlamydia trachomatis and induces protective immunity against upper genital tract pathologies in mice, Vaccine 27 (22) (2009) 2967--2980.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Sharma, Y. Zhong, F. Dong, J. Piper, G. Wang, G. Zhong, Profiling of human antibody responses to chlamydia trachomatis urogenital tract infection using microplates arrayed with 156 chlamydial fusion proteins, Infection and Immunity 74 (3) (2006) 1490--1499.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Vigil, R. Ortega, A. Jain, R. Nakajima-Sasaki, X. Tan, B. Chomel, R. Kasten, J. Koehler, P. Felgner, Identification of the feline humoral immune response to Bartonella henselae infection by protein microarray, PloS One 5 (7) (2010) e11447.Google ScholarGoogle ScholarCross RefCross Ref
  21. K. Park, M. Kanehisa, Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs, Bioinformatics 19 (13) (2003) 1656--1663.Google ScholarGoogle ScholarCross RefCross Ref
  22. C. Cai, W. Wang, L. Sun, Y. Chen, Protein function classification via support vector machine approach, Mathematical Biosciences 185 (2) (2003) 111--122.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. Cai, L. Han, Z. Ji, X. Chen, Y. Chen, SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence, Nucleic Acids Research 31 (13) (2003) 3692--3697.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Chou, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Structure, Function, and Bioinformatics 43 (3) (2001) 246--255.Google ScholarGoogle Scholar
  25. Z. Feng, C. Zhang, Prediction of membrane protein types based on the hydrophobic index of amino acids, Journal of Protein Chemistry 19 (4) (2000) 269--275.Google ScholarGoogle ScholarCross RefCross Ref
  26. R. Sokal, B. Thomson, Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population, American Journal of Physical Anthropology 129 (1) (2006) 121--131.Google ScholarGoogle ScholarCross RefCross Ref
  27. C. Chui, An introduction to wavelets, Vol. 1, Academic Pr, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Lee, A. Yamamoto, Wavelet analysis: Theory and applications, Hewlett-Packard Journal (1994) 44--52.Google ScholarGoogle Scholar
  29. J. Goswami, A. Chan, Fundamentals of wavelets: theory, algorithms, and applications, Vol. 219, Wiley, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Hamid, Z. Kawasaki, Wavelet-based data compression of power system disturbances using the minimum description length criterion, IEEE Transactions on Power Delivery 17 (2) (2002) 460--466.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. DeVore, B. Jawerth, B. Lucier, Image compression through wavelet transform coding, IEEE Transactions on Information Theory 38 (2) (1992) 719--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Tang, Wavelet theory and its application to pattern recognition, Vol. 36, World Scientific Pub Co Inc., 2000.Google ScholarGoogle Scholar
  33. M. Riera-Guasp, J. Antonino-Daviu, M. Pineda-Sanchez, R. Puche-Panadero, J. Perez-Cruz, A general approach for the transient detection of slip-dependent fault components based on the discrete wavelet transform, IEEE Transactions on Industrial Electronics 55 (12) (2008) 4167--4180.Google ScholarGoogle ScholarCross RefCross Ref
  34. T. Chang, C. Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Transactions on Image Processing 2 (4) (1993) 429--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Lang, H. Guo, J. Odegard, C. Burrus, R. Wells Jr, Noise reduction using an undecimated discrete wavelet transform, IEEE Signal Processing Letters 3 (1) (1996) 10--12.Google ScholarGoogle ScholarCross RefCross Ref
  36. P. Lio, Wavelets in bioinformatics and computational biology: state of art and perspectives, Bioinformatics 19 (1) (2003) 2--9.Google ScholarGoogle ScholarCross RefCross Ref
  37. A. Elloumi Oueslati, Z. Lachiri, N. Ellouze, Detecting particular features in c. elegans genomes using synchronous analysis based on wavelet transform, International Journal of Bioinformatics Research and Applications 7 (2) (2011) 183--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Bidaut, F. Manion, C. Garcia, M. Ochs, WaveRead: automatic measurement of relative gene expression levels from microarrays using wavelet analysis, Journal of Biomedical Informatics 39 (4) (2006) 379--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Z. Aung, K. Tan, Rapid retrieval of protein structures from databases, Drug Discovery Today 12 (17--18) (2007) 732--739.Google ScholarGoogle ScholarCross RefCross Ref
  40. A. Haar, Zur theorie der orthogonalen funktionensysteme, Mathematische Annalen 69 (3) (1910) 331--371.Google ScholarGoogle ScholarCross RefCross Ref
  41. P. Porwik, A. Lisowska, The Haar-wavelet transform in digital image processing: its status and achievements, Machine Graphics and Vision 13 (2004) 79--98.Google ScholarGoogle Scholar
  42. C. Papageorgiou, T. Poggio, A trainable system for object detection, International Journal of Computer Vision 38 (1) (2000) 15--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ü. Lepik, Application of the haar wavelet transform to solving integral and differential equations, Proceedings of the Estonian Academy of Sciences. Physics, Mathmatics 56 (1) (2007) 28--46.Google ScholarGoogle Scholar
  44. F. Luisier, C. Vonesch, T. Blu, M. Unser, Fast Haar-wavelet denoising of multidimensional fluorescence microscopy data, in: Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro, IEEE Press, 2009, pp. 310--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. E. Frank, M. Hall, G. Holmes, R. Kirkby, B. Pfahringer, I. Witten, L. Trigg, Weka-a machine learning workbench for data mining, Data Mining and Knowledge Discovery Handbook (2010) 1269--1277.Google ScholarGoogle Scholar
  46. S. Le Cessie, J. Van Houwelingen, Ridge estimators in logistic regression, Applied Statistics (1992) 191--201.Google ScholarGoogle Scholar
  47. V. Vapnik, The nature of statistical learning theory, Springer-Verlag New York Inc., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Wu, H. Liu, X. Duan, Y. Ding, H. Wu, Y. Bai, X. Sun, Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature, Bioinformatics 25 (1) (2009) 30--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. W. Zhang, Y. Xiong, M. Zhao, H. Zou, X. Ye, J. Liu, Prediction of conformational b-cell epitopes from 3d structures by random forests with a distance-based feature, BMC bioinformatics 12 (1) (2011) 341.Google ScholarGoogle Scholar
  51. K. Moorthy, M. Mohamad, Random forest for gene selection and microarray data classification, Bioinformation 7 (3) (2011) 142.Google ScholarGoogle ScholarCross RefCross Ref
  52. U. Muppirala, V. Honavar, D. Dobbs, Predicting rna-protein interactions using only sequence information, BMC Bioinformatics 12 (1) (2011) 489.Google ScholarGoogle ScholarCross RefCross Ref
  53. P. Baldi, S. Brunak, Y. Chauvin, C. A. Andersen, H. Nielsen, Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics 16 (2000) 412--424.Google ScholarGoogle ScholarCross RefCross Ref
  54. J. Cheng, A. Randall, M. Sweredoski, P. Baldi, SCRATCH: a protein structure and structural feature prediction server, Nucleic Acids Research 33 (suppl 2) (2005) W72--W76.Google ScholarGoogle ScholarCross RefCross Ref
  55. J. Cheng, M. Sweredoski, P. Baldi, DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks, Data Mining and Knowledge Discovery 13 (1) (2006) 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. Krogh, B. Larsson, G. Von Heijne, E. Sonnhammer, Predicting transmembrane protein topology with a hidden markov model: application to complete genomes1, Journal of Molecular Biology 305 (3) (2001) 567--580.Google ScholarGoogle ScholarCross RefCross Ref
  57. Y. EL-Manzalawy, D. Dobbs, V. Honavar, Predicting protective linear B-cell epitopes using evolutionary information, in: Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine, 2008, pp. 289--292. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predicting protective bacterial antigens using random forest classifiers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
          October 2012
          725 pages
          ISBN:9781450316705
          DOI:10.1145/2382936

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          BCB '12 Paper Acceptance Rate33of159submissions,21%Overall Acceptance Rate254of885submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader