skip to main content
10.1145/2390068.2390078acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Indexing methods for efficient protein 3D surface search

Published:29 October 2012Publication History

ABSTRACT

This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.

References

  1. Z. Aung, W. Fu, and K. lee Tan. An efficient index-based protein structure database searching method. In Intl. Conf. on Database Systems for Advanced Applications (DASFAA), pages 311--318, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. Proc. Int. Conf. Data Engineering (ICDE), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Canterakis. 3d zernike moments and zernike affine invariants for 3d image analysis and recognition. In In 11th Scandinavian Conf. on Image Analysis, pages 85--93, 1999.Google ScholarGoogle Scholar
  4. P. Ciaccia, M. Patella, and P. Zezula. M-tree: an efficient access method for similarity search in metric spaces. Proc. Int. Conf. Very Large Databases (VLDB), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. L. Connolly. The molecular surface package. Journal of Molecular Graphics, 11(2):139--141, June 1993.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. L. Conte, S. E. Brenner, T. J. Hubbard, C. Chothia, and A. G. Murzin. Scop database in 2002: refinements accommodate structural genomics. Nucleic. Acids Res., pages 316--319, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. Deng, X. Zhou, H. T. Shen, Q. Liu, K. Xu, and X. Lin. A multi-resolution surface distance model for k-nn query processing. The VLDB Journal, 17(5):1101--1119, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Gerstein and M. Levitt. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of ptotein structures. In Proc. Int. Conf. Intl. Syst. Mol. Biol., pages 393--398, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.-F. Gibrat, T. Madej, and S. H. Bryant. Surprising similarities in structure comparison. Curr. Opi. Struct. Biol., pages 377--385, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Holm and C. S. Touring protein fold space with dali/fssp. Nucleic Acids Res, 26:316--319, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  11. L. Holm and C. Sander. Protein structure comparison by alighment of distance matrices. Mol. Biol., pages 123--138, 1993.Google ScholarGoogle Scholar
  12. H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: An adaptive b-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., pages 364--397, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Keim. Tutorial on high-dimensional index structures: Database support for next decades applications. Proc. Int. Conf. Data Engineering (ICDE), 2000.Google ScholarGoogle Scholar
  14. D. Kihara and J. Skolnick. The pdb is a covering set of small protein structures. Mol. Biol., pages 793--802, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. K. Kinoshita and H. Nakamura. Identification of protein biochemical functions by similarity search using the molecular surface database ef-site. Protein Sci., 2003.Google ScholarGoogle ScholarCross RefCross Ref
  16. R. Kolodny, D. Petrey, and B. Honig. Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. Curr. Opin. Struct. Biol., pages 393--398, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. La, J. Esquivel-Rodríguez, V. Venkatraman, B. Li, L. Sael, S. Ueng, S. Ahrendt, and D. Kihara. 3d-surfer: software for high-throughput protein surface comparison and analysis. Bioinformatics, 25(21):2843--2844, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Lee, B. Li, D. La, Y. Fang, K. Ramani, R. Rustamov, and D. Kihara. Fast protein tertiary structure retrieval based on global surface shape similarity. Curr Opin Struct Biol, pages 393--398, 2006.Google ScholarGoogle Scholar
  19. J. B. MacQueen. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. on Math. statist. and Prob., pages 281--297, 1967.Google ScholarGoogle Scholar
  20. T. Madej, J. F. Gibrat, and S. H. Bryant. Threading a database of protein cores. Proteins, 23(3):356--369, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Martin. The ups and downs of protein topology: rapid comparison of protein structure. Protein Eng., pages 829--837, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Mizuguchi and N. Go. Seeking significance in three-dimensional rotein structure comparisons. Curr. Opin. Struct. Biol., pages 377--382, 1995.Google ScholarGoogle Scholar
  23. A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Mol. Biol., 247:536--540, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. P. nad J. Lozano and P. Larranaga. An empirical comparison of four initializatoin methods for the k-means algorithm. pages 393--398, 1999.Google ScholarGoogle Scholar
  25. M. Novotni and R. Klein. 3d zernike descriptors for content based shape retrieval. In Proceedings of the eighth ACM symposium on Solid modeling and applications, SM '03, pages 216--225, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells, and J. M. Thornton. CATH - a hierarchic classification of protein domain structures. Structure (London, England : 1993), 5(8):1093--1108, 1997.Google ScholarGoogle Scholar
  27. H. T. Shen, Z. Huang, J. Cao, and X. Zhou. High-dimensional indexing with oriented cluster representation for multimedia databases. ICME'09, pages 1628--1631, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. N. Shindyalov and P. E. Bourne. Protein structure alighment by incremental combinatorial extension (ce) of the optimal path. Protein Eng., pages 739--747, 1997.Google ScholarGoogle Scholar
  29. A. P. Singh and D. L. Brutlag. Hierarchical protein structure superposition using both secondary structure and atomic representations. In Intl. Syst. for Mol. Biol. (ISMB), pages 1013--1022, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Venkatraman, P. R. R. Chakravarthy, and D. Kihara. Application of 3D Zernike descriptors to shape-based ligand similarity searching. Journal of cheminformatics, 1, 2009.Google ScholarGoogle Scholar
  31. H. Yu, I. Ko, Y. Kim, S. Hwang, and W.-S. Han. Exact indexing for support vector machines. In Proc. int. conf. on Management of data. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Indexing methods for efficient protein 3D surface search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DTMBIO '12: Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
      October 2012
      92 pages
      ISBN:9781450317160
      DOI:10.1145/2390068

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate41of247submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader