skip to main content
10.1145/2396761.2398600acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Star-Join: spatio-textual similarity join

Published:29 October 2012Publication History

ABSTRACT

Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or different user descriptions, it calls for efficient methods to integrate spatio-textual data from different sources. In this paper we study a new research problem called spatio-textual similarity join: given two sets of spatio-textual objects, we find the similar object pairs. To the best of our knowledge, we are the first to study this problem. We make the following contributions: (1) We develop a filter-and-refine framework and devise several efficient algorithms. We first generate spatial and textual signatures for the objects and build inverted index on top of these signatures. Then we generate candidate pairs using the inverted lists of signatures. Finally we refine the candidates and generate the final result. (2) We study how to generate high-quality signatures for spatial information. We develop an MBR-prefix based signature to prune large numbers of dissimilar object pairs. (3) Experimental results on real and synthetic datasets show that our algorithms achieve high performance and scale well.

References

  1. A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918--929, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, pages 131--140, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient processing of spatial joins using r-trees. In SIGMOD Conference, pages 237--246, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Fan, G. Li, L. Zhou, S. Chen, and J. hu. Seal: Spatio-textual similarity search. PVLDB, 2(1):337--348, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD Conference, pages 47--57, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. H. Jacox and H. Samet. Spatial join techniques. ACM Trans. Database Syst., 32(1):7, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Koudas and K. C. Sevcik. Size separation spatial join. In SIGMOD Conference, pages 324--335, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Li, D. Deng, J. Wang, and J. Feng. Pass-join: A partition-based method for similarity joins. PVLDB, 5(3):253--264, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Li, J. Feng, and J. Xu. Desks: Direction-aware spatial keyword search. In ICDE, pages 474--485, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. In SIGMOD Conference, pages 247--258, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. M. Patel and D. J. DeWitt. Partition based spatial-merge join. In SIGMOD Conference, pages 259--270, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Xiao, W. Wang, and X. Lin. Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB, 1(1):933--944, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Star-Join: spatio-textual similarity join

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
          October 2012
          2840 pages
          ISBN:9781450311564
          DOI:10.1145/2396761

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader