skip to main content
research-article

Efficient k-nearest neighbor searching in nonordered discrete data spaces

Published:10 June 2010Publication History
Skip Abstract Section

Abstract

Numerous techniques have been proposed in the past for supporting efficient k-nearest neighbor (k-NN) queries in continuous data spaces. Limited work has been reported in the literature for k-NN queries in a nonordered discrete data space (NDDS). Performing k-NN queries in an NDDS raises new challenges. The Hamming distance is usually used to measure the distance between two vectors (objects) in an NDDS. Due to the coarse granularity of the Hamming distance, a k-NN query in an NDDS may lead to a high degree of nondeterminism for the query result. We propose a new distance measure, called Granularity-Enhanced Hamming (GEH) distance, which effectively reduces the number of candidate solutions for a query. We have also implemented k-NN queries using multidimensional database indexing in NDDSs. Further, we use the properties of our multidimensional NDDS index to derive the probability of encountering valid neighbors within specific regions of the index. This probability is used to develop a new search ordering heuristic. Our experiments on synthetic and genomic data sets demonstrate that our index-based k-NN algorithm is efficient in finding k-NNs in both uniform and nonuniform data sets in NDDSs and that our heuristics are effective in improving the performance of such queries.

References

  1. Badel, A., Mornon, J., and Hazout, S. 1992. Searching for geometric molecular shape complementarity using bidimensional surface profiles. J. Molec. Graph. 10, 4, 205--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 322--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Berchtold, S., Keim, D. A., and Kriegel, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Databases. 28--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bookstein, A., Kulyukin, V. A., and Raita, T. 2002. Generalized Hamming distance. Inform. Retriev. 5, 4, 353--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chávez, E., Navarro, B., Baeza-Yates, R., and Marroquín, J. L. 2001. Searching in metric spaces. ACM Comput. Surv. 33, 3, 273--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB). 426--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guttman, A. 1988. R-trees: A Dynamic Index Structure for Spatial Searching. Morgan Kaufmann Publishers Inc., San Francisco, CA.Google ScholarGoogle Scholar
  8. Hamming, R. 1950. Error-detecting and error-correcting codes. Bell Syst. Tech. J. 29, 2, 147--160.Google ScholarGoogle ScholarCross RefCross Ref
  9. Henrich, A. 1998. The LSDh-tree: An access structure for feature vectors. In Proceedings of the 14th International Conference on Data Engineering (ICDE). 362--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hjaltason, G. and Samet, H. 2000. Incremental similarity search in multimedia databases. Tech. Rep. TR-4199, Computer-Science Department, University of Maryland.Google ScholarGoogle Scholar
  11. Kent, W. J. 2002. Blat--the blast-like alignment tool. Genome Res. 12, 4, 656--664.Google ScholarGoogle ScholarCross RefCross Ref
  12. Kolahdouzan, M. and Shahabi, C. 2004. Voronoi-based k nearest neighbor search for spatial network databases. In Proceedings of the 13th international Conference on Very Large Data Bases (VLDB). 840--851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kolbe, D., Zhu, Q., and Pramanik, S. 2007. On k-nearest neighbor searching in nonordered discrete data spaces. In Proceedings of the 23rd International Conference on Data Engineering (ICDE). 426--435.Google ScholarGoogle Scholar
  14. Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., and Protopapas, Z. 1996. Fast nearest neighbor search in medical image databases. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB). 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lewis, F., Hughes, G. J., Rambaut, A., Pozniak, A., and Brown, A. J. L. 2008. Episodic sexual transmission of HIV revealed by molecular phylodynamics. PLoS Medicine 5, 3.Google ScholarGoogle ScholarCross RefCross Ref
  16. Qian, G. 2004. Principles and applications for supporting similarity queries in non-ordered-discrete and continuous data spaces. Ph.D. thesis, Michigan State University, East Lansing, MI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Qian, G., Zhu, Q., Xue, Q., and Pramanik, S. 2003. The ND-tree: A dynamic indexing technique for multidimensional non-ordered discrete data spaces. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). 620--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Qian, G., Zhu, Q., Xue, Q., and Pramanik, S. 2006a. Dynamic indexing for multidimensional non-ordered discrete data spaces using a data-partitioning approach. ACM Trans. Datab. Syst. 31, 2, 439--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Qian, G., Zhu, Q., Xue, Q., and Pramanik, S. 2006b. A space-partitioning-based indexing method for multidimensional non-ordered discrete data spaces. ACM Trans. Inform. Syst. 24, 1, 79--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Robinson, J. T. 1981. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Roussopoulos, N., Kelley, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Seidl, T. and Kriegel, H.-P. 1997. Efficient user-adaptable similarity search in large multimedia databases. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB). 506--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Seidl, T. and Kriegel, H.-P. 1998. Optimal multi-step k-nearest neighbor search. SIGMOD Rec. 27, 2, 154--165. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient k-nearest neighbor searching in nonordered discrete data spaces

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 28, Issue 2
      May 2010
      165 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/1740592
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2010
      • Accepted: 1 May 2009
      • Revised: 1 March 2009
      • Received: 1 September 2008
      Published in tois Volume 28, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader