ABSTRACT
Identifying protective antigens from bacterial pathogens is important for developing vaccines. Most computational methods for predicting protein antigenicity rely on sequence similarity between a query protein sequence and at least one known antigen. Such methods limit our ability to predict novel antigens (i.e., antigens that are not homologous to any known antigen). Therefore, there is an urgent need for alignment-free computational methods for reliable prediction of protective antigens.
We evaluated the discriminative power of four different amino acid composition derived feature representations using three classification methods (Logistic Regression, Support Vector Machine, and Random Forest) on a cross validation data set of 193 protective bacterial antigens and 193 non-antigenic bacterial proteins. Our results show that, with all four data representations, Random Forest classifiers consistently outperform other classifiers. We compared HRF50, one of the best performing Random Forest classifiers with VaxiJen and SignalP on independent test sets derived from the Chlamydia trachomatis and Bartonella proteomes. Our results show that our HRF50 predictor outperforms VaxiJen and is competitive with SignalP and ANTIGENpro in predicting protective antigens. We further showed that when we combine SignalP with HRF50, the resulting method, which we call BacGen, yields performance that is comparable to or better than that of ANTIGENpro in predicting antigens in bacterial sequences. We conclude that amino acid sequence composition derived features can be effectively used to design alignment-free methods for predicting protein antigenicity using Random Forest classifiers. BacGen is available as an online server at:http://ailab.cs.iastate.edu/bacgen/.
- S. Bambini, R. Rappuoli, The use of genomics in microbial vaccine development, Drug Discovery Today 14 (5--6) (2009) 252--260.Google ScholarCross Ref
- R. Rappuoli, Reverse vaccinology, Current Opinion in Microbiology 3 (5) (2000) 445--450.Google ScholarCross Ref
- J. Adu-Bobie, B. Capecchi, D. Serruto, R. Rappuoli, M. Pizza, Two years into reverse vaccinology, Vaccine 21 (7--8) (2003) 605--610.Google ScholarCross Ref
- R. Rappuoli, A. Aderem, A 2020 vision for vaccines against HIV, tuberculosis and malaria, Nature 473 (7348) (2011) 463--469.Google Scholar
- A. Sette, R. Rappuoli, Reverse vaccinology: developing vaccines in the era of genomics, Immunity 33 (4) (2010) 530--541.Google ScholarCross Ref
- D. Jones, Reverse vaccinology on the cusp, Nature Reviews Drug Discovery 11 (3) (2012) 175--176.Google ScholarCross Ref
- D. Flower, I. Macdonald, K. Ramakrishnan, M. Davies, I. Doytchinova, Computer aided selection of candidate vaccine antigens, Immunome Research 6 (2010) 1--16.Google ScholarCross Ref
- I. Doytchinova, D. Flower, VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines, BMC Bioinformatics 8 (1) (2007) 4.Google ScholarCross Ref
- C. Magnan, M. Zeller, M. Kayala, A. Vigil, A. Randall, P. Felgner, P. Baldi, High-throughput prediction of protein antigenicity using protein microarray data, Bioinformatics 26 (23) (2010) 2936--2943. Google ScholarDigital Library
- J. Dyrløv Bendtsen, H. Nielsen, G. von Heijne, S. Brunak, Improved prediction of signal peptides: Signalp 3.0, Journal of Molecular Biology 340 (4) (2004) 783--795.Google Scholar
- B. Yang, S. Sayers, Z. Xiang, Y. He, Protegen: a web-based protective antigen database and analysis system, Nucleic Acids Research 39 (suppl 1) (2011) D1073--D1078.Google ScholarCross Ref
- J. Shi, S. Zhang, Y. Liang, Q. Pan, Prediction of protein subcellular localizations using moment descriptors and support vector machine, Pattern Recognition in Bioinformatics (2006) 105--114. Google ScholarDigital Library
- S. Altschul, T. Madden, A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D. Lipman, Gapped blast and psi-blast: a new generation of protein database search programs, Nucleic Acids Research 25 (17) (1997) 3389--3402.Google ScholarCross Ref
- O. Finco, E. Frigimelica, F. Buricchi, R. Petracca, G. Galli, E. Faenzi, E. Meoni, A. Bonci, M. Agnusdei, F. Nardelli, et al., Approach to discover t-and b-cell antigens of intracellular pathogens applied to the design of chlamydia trachomatis vaccines, Proceedings of the National Academy of Sciences 108 (24) (2011) 9969--9974.Google ScholarCross Ref
- F. Follmann, A. Olsen, K. Jensen, P. Hansen, P. Andersen, M. Theisen, Antigenic profiling of a chlamydia trachomatis gene-expression library, Journal of Infectious Diseases 197 (6) (2008) 897--905.Google ScholarCross Ref
- R. Coler, A. Bhatia, J. Maisonneuve, P. Probst, B. Barth, P. Ovendale, H. Fang, M. Alderson, Y. Lobet, J. Cohen, et al., Identification and characterization of novel recombinant vaccine antigens for immunization against genital chlamydia trachomatis, FEMS Immunology & Medical Microbiology 55 (2) (2009) 258--270.Google ScholarCross Ref
- D. Molina, S. Pal, M. Kayala, A. Teng, P. Kim, P. Baldi, P. Felgner, X. Liang, L. De la Maza, Identification of immunodominant antigens of chlamydia trachomatis using proteome microarrays, Vaccine 28 (17) (2010) 3014--3024.Google ScholarCross Ref
- J. Wang, L. Chen, F. Chen, X. Zhang, Y. Zhang, J. Baseman, S. Perdue, I. Yeh, R. Shain, M. Holland, et al., A chlamydial type iii-secreted effector protein (tarp) is predominantly recognized by antibodies from humans infected with chlamydia trachomatis and induces protective immunity against upper genital tract pathologies in mice, Vaccine 27 (22) (2009) 2967--2980.Google ScholarCross Ref
- J. Sharma, Y. Zhong, F. Dong, J. Piper, G. Wang, G. Zhong, Profiling of human antibody responses to chlamydia trachomatis urogenital tract infection using microplates arrayed with 156 chlamydial fusion proteins, Infection and Immunity 74 (3) (2006) 1490--1499.Google ScholarCross Ref
- A. Vigil, R. Ortega, A. Jain, R. Nakajima-Sasaki, X. Tan, B. Chomel, R. Kasten, J. Koehler, P. Felgner, Identification of the feline humoral immune response to Bartonella henselae infection by protein microarray, PloS One 5 (7) (2010) e11447.Google ScholarCross Ref
- K. Park, M. Kanehisa, Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs, Bioinformatics 19 (13) (2003) 1656--1663.Google ScholarCross Ref
- C. Cai, W. Wang, L. Sun, Y. Chen, Protein function classification via support vector machine approach, Mathematical Biosciences 185 (2) (2003) 111--122.Google ScholarCross Ref
- C. Cai, L. Han, Z. Ji, X. Chen, Y. Chen, SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence, Nucleic Acids Research 31 (13) (2003) 3692--3697.Google ScholarCross Ref
- K. Chou, Prediction of protein cellular attributes using pseudo-amino acid composition, Proteins: Structure, Function, and Bioinformatics 43 (3) (2001) 246--255.Google Scholar
- Z. Feng, C. Zhang, Prediction of membrane protein types based on the hydrophobic index of amino acids, Journal of Protein Chemistry 19 (4) (2000) 269--275.Google ScholarCross Ref
- R. Sokal, B. Thomson, Population structure inferred by local spatial autocorrelation: an example from an Amerindian tribal population, American Journal of Physical Anthropology 129 (1) (2006) 121--131.Google ScholarCross Ref
- C. Chui, An introduction to wavelets, Vol. 1, Academic Pr, 1992. Google ScholarDigital Library
- D. Lee, A. Yamamoto, Wavelet analysis: Theory and applications, Hewlett-Packard Journal (1994) 44--52.Google Scholar
- J. Goswami, A. Chan, Fundamentals of wavelets: theory, algorithms, and applications, Vol. 219, Wiley, 2011. Google ScholarDigital Library
- E. Hamid, Z. Kawasaki, Wavelet-based data compression of power system disturbances using the minimum description length criterion, IEEE Transactions on Power Delivery 17 (2) (2002) 460--466.Google ScholarCross Ref
- R. DeVore, B. Jawerth, B. Lucier, Image compression through wavelet transform coding, IEEE Transactions on Information Theory 38 (2) (1992) 719--746. Google ScholarDigital Library
- Y. Tang, Wavelet theory and its application to pattern recognition, Vol. 36, World Scientific Pub Co Inc., 2000.Google Scholar
- M. Riera-Guasp, J. Antonino-Daviu, M. Pineda-Sanchez, R. Puche-Panadero, J. Perez-Cruz, A general approach for the transient detection of slip-dependent fault components based on the discrete wavelet transform, IEEE Transactions on Industrial Electronics 55 (12) (2008) 4167--4180.Google ScholarCross Ref
- T. Chang, C. Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Transactions on Image Processing 2 (4) (1993) 429--441. Google ScholarDigital Library
- M. Lang, H. Guo, J. Odegard, C. Burrus, R. Wells Jr, Noise reduction using an undecimated discrete wavelet transform, IEEE Signal Processing Letters 3 (1) (1996) 10--12.Google ScholarCross Ref
- P. Lio, Wavelets in bioinformatics and computational biology: state of art and perspectives, Bioinformatics 19 (1) (2003) 2--9.Google ScholarCross Ref
- A. Elloumi Oueslati, Z. Lachiri, N. Ellouze, Detecting particular features in c. elegans genomes using synchronous analysis based on wavelet transform, International Journal of Bioinformatics Research and Applications 7 (2) (2011) 183--201. Google ScholarDigital Library
- G. Bidaut, F. Manion, C. Garcia, M. Ochs, WaveRead: automatic measurement of relative gene expression levels from microarrays using wavelet analysis, Journal of Biomedical Informatics 39 (4) (2006) 379--388. Google ScholarDigital Library
- Z. Aung, K. Tan, Rapid retrieval of protein structures from databases, Drug Discovery Today 12 (17--18) (2007) 732--739.Google ScholarCross Ref
- A. Haar, Zur theorie der orthogonalen funktionensysteme, Mathematische Annalen 69 (3) (1910) 331--371.Google ScholarCross Ref
- P. Porwik, A. Lisowska, The Haar-wavelet transform in digital image processing: its status and achievements, Machine Graphics and Vision 13 (2004) 79--98.Google Scholar
- C. Papageorgiou, T. Poggio, A trainable system for object detection, International Journal of Computer Vision 38 (1) (2000) 15--33. Google ScholarDigital Library
- Ü. Lepik, Application of the haar wavelet transform to solving integral and differential equations, Proceedings of the Estonian Academy of Sciences. Physics, Mathmatics 56 (1) (2007) 28--46.Google Scholar
- F. Luisier, C. Vonesch, T. Blu, M. Unser, Fast Haar-wavelet denoising of multidimensional fluorescence microscopy data, in: Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro, IEEE Press, 2009, pp. 310--313. Google ScholarDigital Library
- E. Frank, M. Hall, G. Holmes, R. Kirkby, B. Pfahringer, I. Witten, L. Trigg, Weka-a machine learning workbench for data mining, Data Mining and Knowledge Discovery Handbook (2010) 1269--1277.Google Scholar
- S. Le Cessie, J. Van Houwelingen, Ridge estimators in logistic regression, Applied Statistics (1992) 191--201.Google Scholar
- V. Vapnik, The nature of statistical learning theory, Springer-Verlag New York Inc., 2000. Google ScholarDigital Library
- L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5--32. Google ScholarDigital Library
- J. Wu, H. Liu, X. Duan, Y. Ding, H. Wu, Y. Bai, X. Sun, Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature, Bioinformatics 25 (1) (2009) 30--35. Google ScholarDigital Library
- W. Zhang, Y. Xiong, M. Zhao, H. Zou, X. Ye, J. Liu, Prediction of conformational b-cell epitopes from 3d structures by random forests with a distance-based feature, BMC bioinformatics 12 (1) (2011) 341.Google Scholar
- K. Moorthy, M. Mohamad, Random forest for gene selection and microarray data classification, Bioinformation 7 (3) (2011) 142.Google ScholarCross Ref
- U. Muppirala, V. Honavar, D. Dobbs, Predicting rna-protein interactions using only sequence information, BMC Bioinformatics 12 (1) (2011) 489.Google ScholarCross Ref
- P. Baldi, S. Brunak, Y. Chauvin, C. A. Andersen, H. Nielsen, Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics 16 (2000) 412--424.Google ScholarCross Ref
- J. Cheng, A. Randall, M. Sweredoski, P. Baldi, SCRATCH: a protein structure and structural feature prediction server, Nucleic Acids Research 33 (suppl 2) (2005) W72--W76.Google ScholarCross Ref
- J. Cheng, M. Sweredoski, P. Baldi, DOMpro: protein domain prediction using profiles, secondary structure, relative solvent accessibility, and recursive neural networks, Data Mining and Knowledge Discovery 13 (1) (2006) 1--10. Google ScholarDigital Library
- A. Krogh, B. Larsson, G. Von Heijne, E. Sonnhammer, Predicting transmembrane protein topology with a hidden markov model: application to complete genomes1, Journal of Molecular Biology 305 (3) (2001) 567--580.Google ScholarCross Ref
- Y. EL-Manzalawy, D. Dobbs, V. Honavar, Predicting protective linear B-cell epitopes using evolutionary information, in: Proceedings of the 2008 IEEE International Conference on Bioinformatics and Biomedicine, 2008, pp. 289--292. Google ScholarDigital Library
Index Terms
Predicting protective bacterial antigens using random forest classifiers
Recommendations
Predicting residue–residue contacts using random forest models
Motivation: Protein residue–residue contact prediction can be useful in predicting protein 3D structures. Current algorithms for such a purpose leave room for improvement.
Results: We develop ProC_S3, a set of Random Forest algorithm-based models, ...
Predicting Protective Linear B-Cell Epitopes Using Evolutionary Information
BIBM '08: Proceedings of the 2008 IEEE International Conference on Bioinformatics and BiomedicineMapping B-cell epitopes plays an important role in vaccine design, immunodiagnostic tests, and antibody production. Because the experimental determination of B-cell epitopes is time-consuming and expensive, there is an urgent need for computational ...
Predicting virulent proteins in bacterial pathogens Using A Novel Method
CAIH2020: Proceedings of the 2020 Conference on Artificial Intelligence and HealthcareIdentifying whether the uncharacterized protein belongs to a virulent protein or not is important. If it is virulent protein, it is very useful for studying its virulence mechanisms in pathogens as well as designing antiviral drugs. Particularly, with a ...
Comments