skip to main content
article

PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM

Published: 01 July 2008 Publication History

Abstract

The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\\%, 76.5\\%, and 91.9\\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper.

References

[1]
K. Nakai, "Protein Sorting Signals and Prediction of Subcellular Localization," Advances in Protein Chemistry, vol. 54, no. 1, pp. 277- 344, 2000.
[2]
K. Nakai and M. Kanehisa, "Expert System for Predicting Protein Localization Sites in Gram-Negative Bacteria," Proteins: Structure, Function, and Genetics, vol. 11, no. 2, pp. 95-110, 1991.
[3]
K. Nakai and M. Kanehisa, "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells," Genomics, vol. 14, pp. 897-911, 1992.
[4]
O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, "Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence," J. Molecular Biology, vol. 300, pp. 1005-1016, 1997.
[5]
H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne, "A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of Their Cleavage Sites," Int'l J. Neural Systems, vol. 8, pp. 581-599, 1997.
[6]
H. Nielsen, S. Brunak, and G. von Heijne, "Machine Learning Approaches for the Prediction of Signal Peptides and Other Protein Sorting Signals," Protein Eng., vol. 12, no. 1, pp. 3-9, 1999.
[7]
P. Horton, K.J. Park, T. Obayashi, and K. Nakai, "Protein Subcellular Localization Prediction with WoLF PSORT," Proc. Fourth Ann. Asia Pacific Bioinformatics Conf. (APBC '06), pp. 39-48, 2006.
[8]
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, "Basic Local Alignment Search Tool," J. Molecular Biology, vol. 215, pp. 403-410, 1990.
[9]
H. Nakashima and K. Nishikawa, "Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-Pair Frequencies," J. Molecular Biology, vol. 238, pp. 54-61, 1994.
[10]
J. Cedano, P. Aloy, J.A. Perez-Pons, and E. Querol, "Relation between Amino Acid Composition and Cellular Location of Proteins," J. Molecular Biology, vol. 266, pp. 594-600, 1997.
[11]
A. Reinhardt and T. Hubbard, "Using Neural Networks for Prediction of the Subcellular Location of Proteins," Nucleic Acids Research, vol. 26, pp. 2230-2236, 1998.
[12]
S.J. Hua and Z.R. Sun, "Support Vector Machine Approach for Protein Subcellular Localization Prediction," Bioinformatics, vol. 17, pp. 721-728, 2001.
[13]
Z. Yuan, "Prediction of Protein Subcellular Locations Using Markov Chain Models," FEBS Letters, vol. 451, no. 1, pp. 23-26, 1999.
[14]
K.J. Park and M. Kanehisa, "Prediction of Protein Subcellular Locations by Support Vector Machines Using Compositions of Amino Acids and Amino Acid Pairs," Bioinformatics, vol. 19, no. 13, pp. 1656-1663, 2003.
[15]
Y. Huang and Y.D. Li, "Prediction of Protein Subcellular Locations Using Fuzzy K-NN Method," Bioinformatics, vol. 20, no. 1, pp. 21- 28, 2004.
[16]
K.C. Chou, "Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition," Proteins: Structure, Function, and Genetics, vol. 43, pp. 246-255, 2001.
[17]
Y.D. Cai and K.C. Chou, "Predicting Subcellular Localization of Proteins in a Hybridization Space," Bioinformatics, vol. 20, pp. 1151-1156, 2004.
[18]
R. Nair and B. Rost, "Sequence Conserved for Subcellular Localization," Protein Science, vol. 11, pp. 2836-2847, 2002.
[19]
Z. Lu, D. Szafron, R. Greiner, P. Lu, D.S. Wishart, B. Poulin, J. Anvik, C. Macdonell, and R. Eisner, "Predicting Subcellular Localization of Proteins Using Machine-Learned Classifiers," Bioinformatics, vol. 20, no. 4, pp. 547-556, 2004.
[20]
J.K. Kim, G.P.S. Raghava, S.Y. Bang, and S. Choi, "Prediction of Subcellular Localization of Proteins Using Pairwise Sequence Alignment and Support Vector Machine," Pattern Recognition Letters, vol. 27, no. 9, pp. 996-1001, 2006.
[21]
J.L. Gardy, C. Spencer, K. Wang, M. Ester, G.E. Tusnady, I. Simon, S.J. Hua, K. deFays, C. Lambert, K. Nakai, and F.S.L. Brinkman, "PSORT-B: Improving Protein Subcellular Localization Prediction for Gram-Negative Bacteria," Nucleic Acids Research, vol. 31, no. 13, pp. 3613-3617, 2003.
[22]
M. Bhasin and G.P.S. Raghava, "ESLpred: SVM-Based Method for Subcellular Localization of Eukaryotic Proteins Using Dipeptide Composition and PSI-BLAST," Nucleic Acids Research, vol. 32, Webserver Issue, pp. 414-419, 2004.
[23]
A. Garg, M. Bhasin, and G.P.S. Raghava, "SVM-Based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order and Similarity Search," J. Biological Chemistry, vol. 280, pp. 14427-14432, 2005.
[24]
S. Busuttil, J. Abela, and G.J. Pace, "Support Vector Machines with Profile-Based Kernels for Remote Protein Homology Detection," Genome Informatics, vol. 15, no. 2, pp. 191-200, 2004.
[25]
H. Rangwala and G. Karypis, "Profile-Based Direct Kernels for Remote Homology Detection and Fold Recognition," Bioinformatics , vol. 21, no. 23, pp. 4239-4247, 2005.
[26]
R. Kuang, E. Ie, K. Wang, K. Wang, M. Siddiqi, Y. Freund, and C. Leslie, "Profile-Based String Kernels for Remote Homology Detection and Motif Extraction," J. Bioinformatics and Computational Biology, vol. 3, pp. 527-550, 2005.
[27]
S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, "Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs," Nucleic Acids Research, vol. 25, pp. 3389-3402, 1997.
[28]
S. Henikoff and J.G. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences, pp. 10915- 10919, 1992.
[29]
T.F. Smith and M.S. Waterman, "Comparison of Biosequences," Advances in Applied Math., vol. 2, pp. 482-489, 1981.
[30]
O. Gotoh, "An Improved Algorithm for Matching Biological Sequences," J. Molecular Biology, vol. 162, pp. 705-708, 1982.
[31]
E.G. Shpaer, M. Robinson, D. Yee, J.D. Candlin, R. Mines, and T. Hunkapiller, "Sensitivity and Selectivity in Protein Similarity Searches: A Comparison of Smith-Waterman in Hardware to BLAST and FASTA," Genomics, vol. 38, pp. 179-191, 1996.
[32]
L. Liao and W.S. Noble, "Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships," J. Computational Biology, vol. 10, no. 6, pp. 857-868, 2003.
[33]
L. Rychlewski, B. Zhang, and A. Godzik, "Fold and Function Predictions for Mycoplasma Genitalium Proteins," Folding and Design, vol. 3, no. 4, pp. 229-238, 1998.
[34]
B. Boeckmann, A. Bairoch, R. Apweiler, M.C. Blatter, A. Estreicher, E. Gasteiger, M.J. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout, and M. Schneider, "The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003," Nucleic Acids Research, vol. 31, pp. 365-370, 2003.
[35]
B.W. Matthews, "Comparison of Predicted and Observed Secondary Structure of T4 Phage Lysozyme," Biochimica et Biophysica Acta, vol. 405, pp. 442-451, 1975.
[36]
M. Bhasin, A. Garg, and G.P.S. Raghava, "PSLpred: Prediction of Subcellular Localization of Bacterial Proteins," Bioinformatics, vol. 21, no. 10, pp. 2522-2524, 2005.
[37]
C.S. Yu, C.J. Lin, and J.K. Hwang, "Predicting Subcellular Localization of Proteins for Gram-Negative Bacteria by Support Vector Machines Based on N-Peptide Compositions," Protein Science, vol. 13, pp. 1402-1406, 2004.
[38]
S.Y. Kung and M.W. Mak, "Feature Selection for Pairwise Scoring Kernels with Applications to Protein Subcellular Localization," Proc. IEEE Int'l Conf. Acoustic, Speech, and Signal Processing (ICASSP '07), pp. 569-572, 2007.
[39]
P. Donnes and A. Hoglund, "Predicting Protein Subcellular Localization: Past, Present, and Future," Genomics, Proteomics, and Bioinformatics, vol. 2, no. 4, pp. 209-215, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 3
July 2008
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2008
Published in TCBB Volume 5, Issue 3

Author Tags

  1. Kernel Methods
  2. Mercer condition
  3. Subcellular localization
  4. Support Vector Machines
  5. profile alignment

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Transductive Learning for Multi-Label Protein Subchloroplast Localization PredictionIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2016.252765714:1(212-224)Online publication date: 1-Jan-2017
  • (2016)mGOF-locNeurocomputing10.1016/j.neucom.2015.09.137217:C(73-82)Online publication date: 12-Dec-2016
  • (2016)Predicting protein subcellular localization based on information content of gene ontology termsComputational Biology and Chemistry10.1016/j.compbiolchem.2016.09.00965:C(1-7)Online publication date: 1-Dec-2016
  • (2015)Identifying affinity classes of inorganic materials binding sequences via a graph-based modelIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2014.232115812:1(193-204)Online publication date: 1-Jan-2015
  • (2013)A Framework for Identifying Affinity Classes of Inorganic Materials Binding Peptide SequencesProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506628(545-551)Online publication date: 22-Sep-2013
  • (2012)StruLocPredInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2012.0481736:2(130-143)Online publication date: 1-Jul-2012
  • (2011)Robust prediction of protein subcellular localization combining PCA and WSVMsComputers in Biology and Medicine10.1016/j.compbiomed.2011.05.01641:8(648-652)Online publication date: 1-Aug-2011

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media