ABSTRACT
Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or different user descriptions, it calls for efficient methods to integrate spatio-textual data from different sources. In this paper we study a new research problem called spatio-textual similarity join: given two sets of spatio-textual objects, we find the similar object pairs. To the best of our knowledge, we are the first to study this problem. We make the following contributions: (1) We develop a filter-and-refine framework and devise several efficient algorithms. We first generate spatial and textual signatures for the objects and build inverted index on top of these signatures. Then we generate candidate pairs using the inverted lists of signatures. Finally we refine the candidates and generate the final result. (2) We study how to generate high-quality signatures for spatial information. We develop an MBR-prefix based signature to prune large numbers of dissimilar object pairs. (3) Experimental results on real and synthetic datasets show that our algorithms achieve high performance and scale well.
- A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918--929, 2006. Google ScholarDigital Library
- R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. In WWW, pages 131--140, 2007. Google ScholarDigital Library
- T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Efficient processing of spatial joins using r-trees. In SIGMOD Conference, pages 237--246, 1993. Google ScholarDigital Library
- S. Chaudhuri, V. Ganti, and R. Kaushik. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, 2006. Google ScholarDigital Library
- J. Fan, G. Li, L. Zhou, S. Chen, and J. hu. Seal: Spatio-textual similarity search. PVLDB, 2(1):337--348, 2012. Google ScholarDigital Library
- I. D. Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In ICDE, 2008. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD Conference, pages 47--57, 1984. Google ScholarDigital Library
- E. H. Jacox and H. Samet. Spatial join techniques. ACM Trans. Database Syst., 32(1):7, 2007. Google ScholarDigital Library
- N. Koudas and K. C. Sevcik. Size separation spatial join. In SIGMOD Conference, pages 324--335, 1997. Google ScholarDigital Library
- G. Li, D. Deng, J. Wang, and J. Feng. Pass-join: A partition-based method for similarity joins. PVLDB, 5(3):253--264, 2011. Google ScholarDigital Library
- G. Li, J. Feng, and J. Xu. Desks: Direction-aware spatial keyword search. In ICDE, pages 474--485, 2012. Google ScholarDigital Library
- M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. In SIGMOD Conference, pages 247--258, 1996. Google ScholarDigital Library
- J. M. Patel and D. J. DeWitt. Partition based spatial-merge join. In SIGMOD Conference, pages 259--270, 1996. Google ScholarDigital Library
- C. Xiao, W. Wang, and X. Lin. Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB, 1(1):933--944, 2008. Google ScholarDigital Library
Index Terms
- Star-Join: spatio-textual similarity join
Recommendations
NUMA-Aware Spatio-Textual Similarity Join
SIGSPATIAL '20: Proceedings of the 28th International Conference on Advances in Geographic Information SystemsSpatio-textual similarity join is an operation for finding documents, which are both spatially close and textually relevant. Joins in databases are considered to be the most expensive operation; similarly spatio-textual similarity join is a resource ...
An Efficient Algorithm for Spatio-Textual Object Cluster Join
AbstractWith the proliferation of GPS-based equipments and location-based services, spatio-textual objects have been playing an indispensable role in spatial data management. It is of great importance to enable the join operation among spatio-...
An efficient algorithm for spatio-textual location matching
AbstractGeospatial location matching plays a significant role in spatial databases. In this paper, we propose and study a novel parallel spatio-textual location matching (STLM) query. Given two sets P and Q of spatial locations with textual attributes, a ...
Comments