ABSTRACT
This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.
- Z. Aung, W. Fu, and K. lee Tan. An efficient index-based protein structure database searching method. In Intl. Conf. on Database Systems for Advanced Applications (DASFAA), pages 311--318, 2003. Google ScholarDigital Library
- N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. Proc. Int. Conf. Data Engineering (ICDE), 2002. Google ScholarDigital Library
- N. Canterakis. 3d zernike moments and zernike affine invariants for 3d image analysis and recognition. In In 11th Scandinavian Conf. on Image Analysis, pages 85--93, 1999.Google Scholar
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: an efficient access method for similarity search in metric spaces. Proc. Int. Conf. Very Large Databases (VLDB), 1997. Google ScholarDigital Library
- M. L. Connolly. The molecular surface package. Journal of Molecular Graphics, 11(2):139--141, June 1993.Google ScholarCross Ref
- L. L. Conte, S. E. Brenner, T. J. Hubbard, C. Chothia, and A. G. Murzin. Scop database in 2002: refinements accommodate structural genomics. Nucleic. Acids Res., pages 316--319, 2002.Google ScholarCross Ref
- K. Deng, X. Zhou, H. T. Shen, Q. Liu, K. Xu, and X. Lin. A multi-resolution surface distance model for k-nn query processing. The VLDB Journal, 17(5):1101--1119, 2008. Google ScholarDigital Library
- M. Gerstein and M. Levitt. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of ptotein structures. In Proc. Int. Conf. Intl. Syst. Mol. Biol., pages 393--398, 2006. Google ScholarDigital Library
- J.-F. Gibrat, T. Madej, and S. H. Bryant. Surprising similarities in structure comparison. Curr. Opi. Struct. Biol., pages 377--385, 1996.Google ScholarCross Ref
- L. Holm and C. S. Touring protein fold space with dali/fssp. Nucleic Acids Res, 26:316--319, 1998.Google ScholarCross Ref
- L. Holm and C. Sander. Protein structure comparison by alighment of distance matrices. Mol. Biol., pages 123--138, 1993.Google Scholar
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., pages 364--397, 2005. Google ScholarDigital Library
- D. Keim. Tutorial on high-dimensional index structures: Database support for next decades applications. Proc. Int. Conf. Data Engineering (ICDE), 2000.Google Scholar
- D. Kihara and J. Skolnick. The pdb is a covering set of small protein structures. Mol. Biol., pages 793--802, 2003.Google ScholarCross Ref
- K. Kinoshita and H. Nakamura. Identification of protein biochemical functions by similarity search using the molecular surface database ef-site. Protein Sci., 2003.Google ScholarCross Ref
- R. Kolodny, D. Petrey, and B. Honig. Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. Curr. Opin. Struct. Biol., pages 393--398, 2006.Google ScholarCross Ref
- D. La, J. Esquivel-Rodríguez, V. Venkatraman, B. Li, L. Sael, S. Ueng, S. Ahrendt, and D. Kihara. 3d-surfer: software for high-throughput protein surface comparison and analysis. Bioinformatics, 25(21):2843--2844, 2009. Google ScholarDigital Library
- S. Lee, B. Li, D. La, Y. Fang, K. Ramani, R. Rustamov, and D. Kihara. Fast protein tertiary structure retrieval based on global surface shape similarity. Curr Opin Struct Biol, pages 393--398, 2006.Google Scholar
- J. B. MacQueen. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. on Math. statist. and Prob., pages 281--297, 1967.Google Scholar
- T. Madej, J. F. Gibrat, and S. H. Bryant. Threading a database of protein cores. Proteins, 23(3):356--369, 1995.Google ScholarCross Ref
- A. Martin. The ups and downs of protein topology: rapid comparison of protein structure. Protein Eng., pages 829--837, 2000.Google ScholarCross Ref
- K. Mizuguchi and N. Go. Seeking significance in three-dimensional rotein structure comparisons. Curr. Opin. Struct. Biol., pages 377--382, 1995.Google Scholar
- A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Mol. Biol., 247:536--540, 1995.Google ScholarCross Ref
- J. P. nad J. Lozano and P. Larranaga. An empirical comparison of four initializatoin methods for the k-means algorithm. pages 393--398, 1999.Google Scholar
- M. Novotni and R. Klein. 3d zernike descriptors for content based shape retrieval. In Proceedings of the eighth ACM symposium on Solid modeling and applications, SM '03, pages 216--225, 2003. Google ScholarDigital Library
- C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells, and J. M. Thornton. CATH - a hierarchic classification of protein domain structures. Structure (London, England : 1993), 5(8):1093--1108, 1997.Google Scholar
- H. T. Shen, Z. Huang, J. Cao, and X. Zhou. High-dimensional indexing with oriented cluster representation for multimedia databases. ICME'09, pages 1628--1631, 2009. Google ScholarDigital Library
- I. N. Shindyalov and P. E. Bourne. Protein structure alighment by incremental combinatorial extension (ce) of the optimal path. Protein Eng., pages 739--747, 1997.Google Scholar
- A. P. Singh and D. L. Brutlag. Hierarchical protein structure superposition using both secondary structure and atomic representations. In Intl. Syst. for Mol. Biol. (ISMB), pages 1013--1022, 2008. Google ScholarDigital Library
- V. Venkatraman, P. R. R. Chakravarthy, and D. Kihara. Application of 3D Zernike descriptors to shape-based ligand similarity searching. Journal of cheminformatics, 1, 2009.Google Scholar
- H. Yu, I. Ko, Y. Kim, S. Hwang, and W.-S. Han. Exact indexing for support vector machines. In Proc. int. conf. on Management of data. Google ScholarDigital Library
Index Terms
- Indexing methods for efficient protein 3D surface search
Recommendations
Fast protein 3D surface search
ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and CommunicationFunctionally annotating protein structures of unknown function is one of the important challenges in Bioinformatics. An informatics approach to predict the function of a protein is by analyzing the functions of other structurally similar proteins. ...
Efficient local ligand-binding site search using landmark mds
DTMBIO '13: Proceedings of the 7th international workshop on Data and text mining in biomedical informaticsIn this work, we propose a new local binding site search system, called Fast Patch-Surfer, for extending previous work, Patch-Surfer. Patch-Surfer efficiently retrieves top-k similar proteins based on new representation of proteins capturing features of ...
Protein structure classification by structural transformatio
IJSIS '96: Proceedings of the 1996 IEEE International Joint Symposia on Intelligence and SystemsProtein structure classification plays an important role in understanding the relationships among structure and sequence. Recently, as the number of known protein structure are increasing steeply, automatic classification is highly required. This paper ...
Comments