skip to main content
research-article

Theoretically optimal and empirically efficient r-trees with strong parallelizability

Published:05 October 2018Publication History
Skip Abstract Section

Abstract

The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index bulk-loading, and over both practical and worst-case workloads. To address this need, we revisit a classic multidimensional access method - the R-tree. We propose a novel R-tree packing strategy that produces R-trees with an asymptotically optimal I/O complexity for window queries in the worst case. Our experiments show that the R-trees produced by the proposed strategy are highly efficient on real and synthetic data of different distributions. The proposed strategy is also simple to parallelize, since it relies only on sorting. We propose a parallel algorithm for R-tree bulk-loading based on the proposed packing strategy, and analyze its performance under the massively parallel communication model. Experimental results confirm the efficiency and scalability of the parallel algorithm over large data sets.

References

  1. http://www.pokemongo.com.Google ScholarGoogle Scholar
  2. http://en.wikipedia.org/wiki/Google_Search.Google ScholarGoogle Scholar
  3. https://www.wired.com/2016/09/pokemon-go-just-fine-without/.Google ScholarGoogle Scholar
  4. http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx.Google ScholarGoogle Scholar
  5. http://download.oracle.com/otndocs/products/spatial/pdf/spatial_features_jsirev.pdf.Google ScholarGoogle Scholar
  6. https://www.cse.ust.hk/~yike/prtree/.Google ScholarGoogle Scholar
  7. http://madalgo.au.dk/tpie/.Google ScholarGoogle Scholar
  8. https://nectar.org.au/.Google ScholarGoogle Scholar
  9. https://www.census.gov/geo/maps-data/data/tiger-line.html.Google ScholarGoogle Scholar
  10. D. Achakeev, M. Seidemann, M. Schmidt, and B. Seeger. Sort-based parallel loading of r-trees. In 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 62--70, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. K. Agarwal, L. Arge, O. Procopiuc, and J. S. Vitter. A framework for index bulk loading and dynamization. In 28th International Colloquium on Automata, Languages and Programming, pages 115--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. K. Agarwal, M. de Berg, J. Gudmundsson, M. Hammar, and H. J. Haverkort. Box-trees and r-trees with near-optimal query time. In 17th Annual Symposium on Computational Geometry (SCG), pages 124--133, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. K. Agarwal, K. Fox, K. Munagala, and A. Nath. Parallel algorithms for constructing range and nearest-neighbor searching data structures. In PODS, pages 429--440, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Andoni, A. Nikolov, K. Onak, and G. Yaroslavtsev. Parallel algorithms for geometric graph problems. In STOC, pages 574--583, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Arge, M. D. Berg, H. Haverkort, and K. Yi. The priority r-tree: A practically efficient and worst-case optimal r-tree. ACM Transactions on Algorithms, 4(1):9:1--9:30, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Beame, P. Koutris, and D. Suciu. Communication steps for parallel query processing. In PODS, pages 273--284, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Berchtold, D. A. Keim, and H.-P. Kriegel. The x-tree : An index structure for high-dimensional data. In VLDB, pages 28--39, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Berg, M. Kreveld, M. Overmars, and O. C. Schwarzkopf. Computational Geometry. Springer Berlin Heidelberg, 2000.Google ScholarGoogle Scholar
  21. B. Chazelle. Functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing, 17(3):427--462, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. J. DeWitt, N. Kabra, J. Luo, J. M. Patel, and J.-B. Yu. Client-server paradise. In VLDB, pages 558--569, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. N. Gabow, J. L. Bentley, and R. E. Tarjan. Scaling and related techniques for geometry problems. In STOC, pages 135--143, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2):170--231, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. J. García R, M. A. López, and S. T. Leutenegger. A greedy algorithm for bulk loading r-trees. In 6th ACM International Symposium on Advances in Geographic Information Systems, pages 163--164, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal on Computing, 29(2):416--432, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Grossi and G. F. Italiano. Efficient cross-trees for external memory. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms and Visualization, pages 87--106, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Haverkort and F. V. Walderveen. Four-dimensional hilbert curves for r-trees. Journal of Experimental Algorithmics, 16:1--19, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. V. Jagadish. Spatial search with polyhedra. In ICDE, pages 311--319, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. V. Jagadish. Analysis of the hilbert curve for representing two-dimensional space. Information Processing Letters, 62(1):17--22, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. I. Kamel and C. Faloutsos. Parallel r-trees. In SIGMOD, pages 195--204, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Kamel and C. Faloutsos. Hilbert r-tree: An improved r-tree using fractals. In VLDB, pages 500--509, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non-replicating index structures. In ICDT, pages 257--276, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Koudas, C. Faloutsos, and I. Kamel. Declustering spatial databases on a multi-computer architecture. In EDBT, pages 592--614, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. T. Leutenegger, J. M. Edgington, and M. A. Lopez. STR: A simple and efficient algorithm for r-tree packing. Technical report, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Mondal, M. Kitsuregawa, B. C. Ooi, and K. L. Tan. R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases. In 9th ACM International Symposium on Advances in Geographic Information Systems, pages 28--33, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Moon, H. V. Jagadish, C. Faloutsos, and J. H. Saltz. Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge and Data Engineering, 13(1):124--141, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Ohsawa and M. Sakauchi. A new tree type data structure with homogeneous nodes suitable for a very large spatial database. In ICDE, pages 296--303, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS, pages 181--190, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Papadopoulos and Y. Manolopoulos. Parallel bulk-loading of spatial data. Parallel Computing, 29(10):1419--1444, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. N. Roussopoulos and D. Leifker. Direct spatial search on pictorial databases using packed r-trees. In SIGMOD, pages 17--31, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. B. Schnitzer and S. T. Leutenegger. Master-client r-trees: a new parallel r-tree architecture. In SSDBM, pages 68--77, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. K. Sellis, N. Roussopoulos, and C. Faloutsos. The r+-tree: A dynamic index for multi-dimensional objects. In VLDB, pages 507--518, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Tao, W. Lin, and X. Xiao. Minimal mapreduce algorithms. In SIGMOD, pages 529--540, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Xu and S. Tirthapura. Optimality of clustering properties of space-filling curves. ACM Transactions on Database Systems, 39(2):10:1--10:27, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. You, J. Zhang, and L. Gruenwald. Parallel spatial query processing on gpus using r-trees. In 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pages 23--31, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Conference on Hot Topics in Cloud Computing, pages 10--10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Zhang, P. Kalnis, B. C. Ooi, and K. Tan. Generalized multidimensional data mapping and query processing. ACM Transactions on Database Systems, 30(3):661--697, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. R. Zhang, B. C. Ooi, and K.-L. Tan. Making the pyramid technique robust to query types and workloads. In ICDE, pages 313--324, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. R. Zhang, J. Qi, M. Stradling, and J. Huang. Towards a painless index for spatial objects. ACM Transactions on Database Systems, 39(3):19:1--19:42, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Theoretically optimal and empirically efficient r-trees with strong parallelizability
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 11, Issue 5
      January 2018
      274 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 5 October 2018
      Published in pvldb Volume 11, Issue 5

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader