Abstract
An algorithm called Bidirectional Long Short-Term Memory Networks (BLSTM) for processing sequential data is introduced. This supervised learning method trains a special recurrent neural network to use very long ranged symmetric sequence context using a combination of nonlinear processing elements and linear feedback loops for storing long-range context. The algorithm is applied to the sequence-based prediction of protein localization and predicts 93.3% novel non-plant proteins and 88.4% novel plant proteins correctly, which is an improvement over feedforward and standard recurrent networks solving the same problem. The BLSTM system is available as a web-service (http://www.stepc.gr/~synaptic/blstm.html).
- M. Reczko, E. Staub, P. Fiziev, and A. Hatzigeorgiou, “Finding Signal Peptides in Human Protein Sequences Using Recurrent Neural Networks,” Lecture Notes in Computer Science, R. Guigo and D. Gusfield, eds., vol. 2452, pp. 60-67, 2002. Google ScholarDigital Library
- F. Gers and J. Schmidhuber, “LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages,” IEEE Trans. Neural Networks, vol. 12, no. 6, pp. 1333-1340, 2001. Google ScholarDigital Library
- G. Schatz and B. Dobberstein, “Common Principles of Protein Translocation across Membranes,” Science, vol. 271, no. 5255, pp.1519-1526, 1996.Google ScholarCross Ref
- B. Eisenhaber and P. Bork, “Wanted: Subcellular Localization of Proteins Based on Sequence,” Trends Cell Biology, vol. 9, pp. 169-170, 1998.Google ScholarCross Ref
- O. Emanuelsson and G. von Heijne, “Predicting of Organellar Targeting Signals,” Biochimica et Biophysica Acta, vol. 1541, pp. 114-119, 2001.Google ScholarCross Ref
- K. Nakai, “Review: Prediction of in vivo Fates of Proteins in the Era of Genomics and Proteomics,” J. Structural Biology, vol. 134, pp. 103-116, 2001.Google ScholarCross Ref
- K. Nakai, “Protein Sorting Signals and Prediction of Subcellular Localization,” Advances in Protein Chemistry, vol. 54, pp. 277-344, 2000.Google ScholarCross Ref
- H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne, “Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of Their Cleavage Sites,” Protein Eng., vol. 10, no. 1, pp. 1-6, 1997.Google ScholarCross Ref
- H. Nielsen, S. Brunak, and G. von Heijne, “Machine Learning Approaches for the Prediction of Signal Peptides and Other Protein Sorting Signals,” Protein Eng., vol. 12, no. 1, pp. 3-9, 1999.Google ScholarCross Ref
- M.G. Claros and P. Vincens, “Computational Method to Predict Mitochondrially Imported Proteins and Their Targeting Sequences,” European J. Biochemistry, vol. 241, pp. 779-786, 1996.Google ScholarCross Ref
- O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, “Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence,” J. Molecular Biology, vol. 300, pp. 1005-1016, 2000.Google ScholarCross Ref
- B. Jagla and J. Schuchhardt, “Adaptive Encoding Neural Networks for the Recognition of Human Signal Peptide Cleavage Sites,” Bioinformatics, vol. 16, pp. 245-250, 2000.Google ScholarCross Ref
- A. Reinhardt and T. Hubbard, “Using Neural Networks for Prediction of the Subcellular Location of Proteins,” Nucleic Acids Research, vol. 26, no. 9, pp. 2230-2236, 1998.Google ScholarCross Ref
- K.C. Chou, “Using Subsite Coupling to Predict Signal Peptides,” Protein Eng., vol. 14, pp. 75-79, 2001.Google ScholarCross Ref
- S. Hua and Z. Sun, “Support Vector Machine Approach for Protein Subcellular Localization Prediction,” Bioinformatics, vol. 17, no. 8, pp. 721-728, 2001.Google ScholarCross Ref
- E.M. Marcotte, I. Xenarios, A.M. van der Bliek, and D. Eisenberg, “Localizing Proteins in the Cell from Their Phylogenetic Profiles,” Proc. Nat'l Academy of Sciences USA, vol. 97, no. 22, pp. 12115-12120, 2000.Google ScholarCross Ref
- R. Mott, J. Schultz, P. Bork, and C.P. Ponting, “Predicting Protein Cellular Localization Using a Domain Projection Method,” Genome Reserach, vol. 12, pp. 1168-1174, 2002.Google ScholarCross Ref
- H. Bannai, Y. Tamada, O. Maruyama, K. Nakai, and S. Miyano, “Extensive Feature Detection of n-Terminal Protein Sorting Signals,” Bioinformatics, vol. 18, no. 2, pp. 298-305, 2002.Google ScholarCross Ref
- A. Drawid and M. Gerstein, “A Bayesian System Integrating Expression Data with Sequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome,” J. Molecular Biology, vol. 301, pp. 1059-1075, 2000.Google ScholarCross Ref
- M. Bhasin and G. Raghava, “ESLpred: SVM-Based Method for Subcellular Localization of Eukaryotic Proteins Using Dipeptide Composition and PSI-BLAST,” Nucleic Acids Research, vol. 32, pp.W414-W419, 2004.Google ScholarCross Ref
- M. Reczko and A. Hatzigeorgiou, “Prediction of Subcellular Localization of Eukaryotic Proteins Using Sequence Signals and Composition,” PROTEOMICS, vol. 4, no. 6, pp. 1591-1596, 2004.Google ScholarCross Ref
- J. Hawkins and M. Boden, “The Applicability of Recurrent Neural Networks for Biological Sequence Analysis,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 3, pp. 243-253, July-Sept. 2005. Google ScholarDigital Library
- F. Gers et al., “Learning Precise Timing with LSTM Recurrent Networks,” J. Machine Learning Research, vol. 3, pp. 115-143, 2002. Google ScholarDigital Library
- S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997. Google ScholarDigital Library
- A.J. Robinson and F. Fallside, “The Utility Driven Dynamic Error Propagation Network,” Technical Report CUED/F-INFENG/TR.1, Eng. Dept., Cambridge Univ., 1987.Google Scholar
- M. Riedmiller and H. Braun, “A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm,” Proc. IEEE Int'l Conf. Neural Networks (ICNN '93), H. Ruspini, ed., pp.586-591, 1993.Google ScholarCross Ref
- M. Schuster and K. Paliwal, “Bidirectional Recurrent Neural Networks,” IEEE Trans. Signal Processing, vol. 45, pp. 2673-2681, 1997. Google ScholarDigital Library
- P. Baldi, S. Brunak, Y. Chauvin, C.A.F. Andersen, and H. Nielsen, “Assessing the Accuracy of Prediction Algorithms for Classification: An Overview,” Bioinformatics, vol. 16, pp. 412-424, 2000.Google Scholar
Index Terms
- Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins
Recommendations
On extended long short-term memory and dependent bidirectional recurrent neural network
Highlights- This work did unprecedented analysis on the memory property of RNN cells.
- The work develops extended LSTM that retains longer memory than previous RNN cells.
- A new robust RNN model called dependent BRNN is proposed.
- The ...
AbstractIn this work, we first analyze the memory behavior in three recurrent neural networks (RNN) cells; namely, the simple RNN (SRN), the long short-term memory (LSTM) and the gated recurrent unit (GRU), where the memory is defined as a function that ...
Prediction of Continuous B-cell Epitopes Using Long Short Term Memory Networks
ICBCB 2018: Proceedings of the 2018 6th International Conference on Bioinformatics and Computational BiologyB-cell epitopes play a vital role in the epitope-based vaccine design. The accumulation of epitope sample data makes it possible to predict epitopes using machine learning methods. Compared with the experimental tests, the computational methods are ...
Predicting the subcellular localization of proteins with multiple sites based on multiple features fusion
Protein sub-cellular localization prediction has attracted much attention in recent years because of its importance for protein function studying and targeted drug discovery, and that makes it to be an important research field in bioinformatics. ...
Comments