Abstract
The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index bulk-loading, and over both practical and worst-case workloads. To address this need, we revisit a classic multidimensional access method - the R-tree. We propose a novel R-tree packing strategy that produces R-trees with an asymptotically optimal I/O complexity for window queries in the worst case. Our experiments show that the R-trees produced by the proposed strategy are highly efficient on real and synthetic data of different distributions. The proposed strategy is also simple to parallelize, since it relies only on sorting. We propose a parallel algorithm for R-tree bulk-loading based on the proposed packing strategy, and analyze its performance under the massively parallel communication model. Experimental results confirm the efficiency and scalability of the parallel algorithm over large data sets.
- http://www.pokemongo.com.Google Scholar
- http://en.wikipedia.org/wiki/Google_Search.Google Scholar
- https://www.wired.com/2016/09/pokemon-go-just-fine-without/.Google Scholar
- http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx.Google Scholar
- http://download.oracle.com/otndocs/products/spatial/pdf/spatial_features_jsirev.pdf.Google Scholar
- https://www.cse.ust.hk/~yike/prtree/.Google Scholar
- http://madalgo.au.dk/tpie/.Google Scholar
- https://nectar.org.au/.Google Scholar
- https://www.census.gov/geo/maps-data/data/tiger-line.html.Google Scholar
- D. Achakeev, M. Seidemann, M. Schmidt, and B. Seeger. Sort-based parallel loading of r-trees. In 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 62--70, 2012. Google ScholarDigital Library
- P. K. Agarwal, L. Arge, O. Procopiuc, and J. S. Vitter. A framework for index bulk loading and dynamization. In 28th International Colloquium on Automata, Languages and Programming, pages 115--127, 2001. Google ScholarDigital Library
- P. K. Agarwal, M. de Berg, J. Gudmundsson, M. Hammar, and H. J. Haverkort. Box-trees and r-trees with near-optimal query time. In 17th Annual Symposium on Computational Geometry (SCG), pages 124--133, 2001. Google ScholarDigital Library
- P. K. Agarwal, K. Fox, K. Munagala, and A. Nath. Parallel algorithms for constructing range and nearest-neighbor searching data structures. In PODS, pages 429--440, 2016. Google ScholarDigital Library
- A. Andoni, A. Nikolov, K. Onak, and G. Yaroslavtsev. Parallel algorithms for geometric graph problems. In STOC, pages 574--583, 2014. Google ScholarDigital Library
- L. Arge, M. D. Berg, H. Haverkort, and K. Yi. The priority r-tree: A practically efficient and worst-case optimal r-tree. ACM Transactions on Algorithms, 4(1):9:1--9:30, 2008. Google ScholarDigital Library
- P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. In PODS, pages 273--284, 2013. Google ScholarDigital Library
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990. Google ScholarDigital Library
- J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarDigital Library
- S. Berchtold, D. A. Keim, and H.-P. Kriegel. The x-tree : An index structure for high-dimensional data. In VLDB, pages 28--39, 1996. Google ScholarDigital Library
- M. Berg, M. Kreveld, M. Overmars, and O. C. Schwarzkopf. Computational Geometry. Springer Berlin Heidelberg, 2000.Google Scholar
- B. Chazelle. Functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427--462, 1988. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.-B. Yu. Client-server paradise. In VLDB, pages 558--569, 1994. Google ScholarDigital Library
- H. N. Gabow, J. L. Bentley, and R. E. Tarjan. Scaling and related techniques for geometry problems. In STOC, pages 135--143, 1984. Google ScholarDigital Library
- V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarDigital Library
- Y. J. García R, M. A. López, and S. T. Leutenegger. A greedy algorithm for bulk loading r-trees. In 6th ACM International Symposium on Advances in Geographic Information Systems, pages 163--164, 1998. Google ScholarDigital Library
- M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal on Computing, 29(2):416--432, 1999. Google ScholarDigital Library
- R. Grossi and G. F. Italiano. Efficient cross-trees for external memory. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization, pages 87--106, 1999. Google ScholarDigital Library
- A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984. Google ScholarDigital Library
- H. Haverkort and F. V. Walderveen. Four-dimensional hilbert curves for r-trees. Journal of Experimental Algorithmics, 16:1--19, 2008. Google ScholarDigital Library
- H. V. Jagadish. Spatial search with polyhedra. In ICDE, pages 311--319, 1990. Google ScholarDigital Library
- H. V. Jagadish. Analysis of the hilbert curve for representing two-dimensional space. Information Processing Letters, 62(1):17--22, 1997. Google ScholarDigital Library
- I. Kamel and C. Faloutsos. Parallel r-trees. In SIGMOD, pages 195--204, 1992. Google ScholarDigital Library
- I. Kamel and C. Faloutsos. Hilbert r-tree: An improved r-tree using fractals. In VLDB, pages 500--509, 1994. Google ScholarDigital Library
- K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non-replicating index structures. In ICDT, pages 257--276, 1999. Google ScholarDigital Library
- N. Koudas, C. Faloutsos, and I. Kamel. Declustering spatial databases on a multi-computer architecture. In EDBT, pages 592--614, 1996. Google ScholarDigital Library
- S. T. Leutenegger, J. M. Edgington, and M. A. Lopez. STR: A simple and efficient algorithm for r-tree packing. Technical report, 1997. Google ScholarDigital Library
- A. Mondal, M. Kitsuregawa, B. C. Ooi, and K. L. Tan. R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In 9th ACM International Symposium on Advances in Geographic Information Systems, pages 28--33, 2001. Google ScholarDigital Library
- B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge and Data Engineering, 13(1):124--141, 2001. Google ScholarDigital Library
- Y. Ohsawa and M. Sakauchi. A new tree type data structure with homogeneous nodes suitable for a very large spatial database. In ICDE, pages 296--303, 1990. Google ScholarDigital Library
- J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS, pages 181--190, 1984. Google ScholarDigital Library
- A. Papadopoulos and Y. Manolopoulos. Parallel bulk-loading of spatial data. Parallel Computing, 29(10):1419--1444, 2003. Google ScholarDigital Library
- N. Roussopoulos and D. Leifker. Direct spatial search on pictorial databases using packed r-trees. In SIGMOD, pages 17--31, 1985. Google ScholarDigital Library
- B. Schnitzer and S. T. Leutenegger. Master-client r-trees: a new parallel r-tree architecture. In SSDBM, pages 68--77, 1999. Google ScholarDigital Library
- T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The r+-tree: A dynamic index for multi-dimensional objects. In VLDB, pages 507--518, 1987. Google ScholarDigital Library
- Y. Tao, W. Lin, and X. Xiao. Minimal mapreduce algorithms. In SIGMOD, pages 529--540, 2013. Google ScholarDigital Library
- P. Xu and S. Tirthapura. Optimality of clustering properties of space-filling curves. ACM Transactions on Database Systems, 39(2):10:1--10:27, 2014. Google ScholarDigital Library
- S. You, J. Zhang, and L. Gruenwald. Parallel spatial query processing on gpus using r-trees. In 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 23--31, 2013. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Conference on Hot Topics in Cloud Computing, pages 10--10, 2010. Google ScholarDigital Library
- R. Zhang, P. Kalnis, B. C. Ooi, and K. Tan. Generalized multidimensional data mapping and query processing. ACM Transactions on Database Systems, 30(3):661--697, 2005. Google ScholarDigital Library
- R. Zhang, B. C. Ooi, and K.-L. Tan. Making the pyramid technique robust to query types and workloads. In ICDE, pages 313--324, 2004. Google ScholarDigital Library
- R. Zhang, J. Qi, M. Stradling, and J. Huang. Towards a painless index for spatial objects. ACM Transactions on Database Systems, 39(3):19:1--19:42, 2014. Google ScholarDigital Library
Index Terms
- Theoretically optimal and empirically efficient r-trees with strong parallelizability
Recommendations
Packing R-trees with Space-filling Curves: Theoretical Optimality, Empirical Efficiency, and Bulk-loading Parallelizability
Best of ICDT 2019 and Regular PapersThe massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index management, and over both practical and worst-case workloads. To address this need, we ...
Theoretically optimal and empirically efficient r-trees with strong parallelizability
The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index bulk-loading, and over both practical and worst-case workloads. To address this need, we ...
Sort-based parallel loading of R-trees
BigSpatial '12: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial DataDue to the increasing amount of spatial data, parallel algorithms for processing big spatial data become more and more important. In particular, the shared nothing architecture is attractive as it offers low cost data processing. Moreover, popular ...
Comments