skip to main content
10.1145/2457317.2457389acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Trie-based similarity search and join

Published:18 March 2013Publication History

ABSTRACT

Driven by the increasing demands from applications such as data cleansing, integration, and bioinformatics, approximate string matching queries have gain much attention recently. In this paper, we present the design and implementation of a trie-based system which supports both string similarity search and join based on our recent work [23].

References

  1. R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chaudhuri and R. Kaushik. Extending autocompletion to tolerate errors. In SIGMOD Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Feng, J. Wang, and G. Li. Trie-join: a trie-based method for efficient string similarity joins. VLDB J., 21(4):437--461, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Hadjieleftheriou and C. Li. Efficient approximate search on string collections. PVLDB, 2(2):1660--1661, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Hadjieleftheriou and D. Srivastava. Approximate string processing. Foundations and Trends in Databases, 2(4):267--402, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Kahveci and A. K. Singh. Efficient index structures for string databases. In VLDB, pages 351--360, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Li, S. Ji, C. Li, and J. Feng. Efficient fuzzy full-text type-ahead search. The VLDB Journal, 20(4):617--640, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Lin and W. Wang. Set and string similarity queries: A survey. Chinese Journal of Computers, (10):1853--1862, 2011.Google ScholarGoogle Scholar
  11. W. J. Masek and M. Paterson. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18--31, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Qin, W. Wang, Y. Lu, C. Xiao, and X. Lin. Efficient exact edit similarity query processing with the asymmetric signature scheme. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Sokol, G. Benson, and J. Tojeira. Tandem repeats over the edit distance. Bioinformatics, 23(2):30--35, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. S. T. Bocek, E. Hunt. Fast Similarity Search in Large Dictionaries. Technical Report ifi-2007.02, Department of Informatics, University of Zurich, April 2007. http://fastss.csg.uzh.ch/.Google ScholarGoogle Scholar
  15. E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64(1-3):100--118, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Wang, J. Feng, and G. Li. Trie-join: Efficient trie-based string similarity joins with edit. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Wang. Similarity joins as stronger metric operations. SIGSPATIAL Special, 2:24--27, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Wang, J. Qin, C. Xiao, X. Lin, and H. T. Shen. VChunkJoin: An efficient algorithm for edit similarity joins. IEEE Trans. Knowl. Data Eng., 2012.Google ScholarGoogle Scholar
  19. W. Wang, C. Xiao, X. Lin, and C. Zhang. Efficient approximate entity extraction with edit constraints. In SIMGOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Xiao, J. Qin, W. Wang, Y. Ishikawa, K. Tsuda, and K. Sadakane. Efficient error-tolerant query autocompletion. Proceedings of the VLDB Endowment, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Xiao, W. Wang, and X. Lin. Ed-Join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB, 1(1):933--944, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Xiao, W. Wang, X. Lin, and J. X. Yu. Efficient similarity joins for near duplicate detection. In WWW, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Zhou, J. Qin, C. Xiao, W. Wang, X. Lin, and Y. Ishikawa. LEVA: An efficient query processing algorithm for error tolerant autocompletion. Submitted for publication.Google ScholarGoogle Scholar

Index Terms

  1. Trie-based similarity search and join

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
          March 2013
          423 pages
          ISBN:9781450315999
          DOI:10.1145/2457317

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 March 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          EDBT '13 Paper Acceptance Rate7of10submissions,70%Overall Acceptance Rate7of10submissions,70%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader