skip to main content
10.1145/3178487.3178509acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections

PAM: parallel augmented maps

Published:10 February 2018Publication History

ABSTRACT

Ordered (key-value) maps are an important and widely-used data type for large-scale data processing frameworks. Beyond simple search, insertion and deletion, more advanced operations such as range extraction, filtering, and bulk updates form a critical part of these frameworks.

We describe an interface for ordered maps that is augmented to support fast range queries and sums, and introduce a parallel and concurrent library called PAM (Parallel Augmented Maps) that implements the interface. The interface includes a wide variety of functions on augmented maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, filtering, extracting ranges, splitting, and range-sums. We describe algorithms for these functions that are efficient both in theory and practice.

As examples of the use of the interface and the performance of PAM we apply the library to four applications: simple range sums, interval trees, 2D range trees, and ranked word index searching. The interface greatly simplifies the implementation of these data structures over direct implementations. Sequentially the code achieves performance that matches or exceeds existing libraries designed specially for a single application, and in parallel our implementation gets speedups ranging from 40 to 90 on 72 cores with 2-way hyperthreading.

Skip Supplemental Material Section

Supplemental Material

References

  1. Georgy Adelson-Velsky and E. M. Landis. 1962. An Algorithm for the Organization of Information. Proc. of the USSR Academy of Sciences 145 (1962), 263--266. In Russian, English translation by Myron J. Ricci in Soviet Doklady, 3:1259--1263, 1962.Google ScholarGoogle Scholar
  2. Pankaj K. Agarwal, Kyle Fox, Kamesh Munagala, and Abhinandan Nath. 2016. Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures. In Proc. ACM SIGMOD-SIGACT-SIGAI Symp. on Principles of Database Systems (PODS). 429--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yaroslav Akhremtsev and Peter Sanders. 2015. Fast Parallel Operations on Search Trees. arXiv preprint arXiv:1510.05433 (2015).Google ScholarGoogle Scholar
  4. Ahmed M. Aly, Hazem Elmeleegy, Yan Qi, and Walid Aref. 2016. Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop. In Proc. ACM International Conference on Web Search and Data Mining (WSDM). ACM, New York, NY, USA, 397--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Flurry analytics. -. (-). https://developer.yahoo.com/flurry/docs/analytics/Google ScholarGoogle Scholar
  6. Antonio Barbuzzi, Pietro Michiardi, Ernst Biersack, and Gennaro Boggia. 2010. Parallel Bulk Insertion for Large-scale Analytics Applications. In Proc. International Workshop on Large Scale Distributed Systems and Middleware (LADIS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. 2017. KiWi: A Key-Value Map for Scalable Real-Time Analytics. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). 357--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rudolf Bayer. 1972. Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms. Acta Informatica 1 (1972), 290--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jon Louis Bentley. 1979. Decomposable searching problems. Information processing letters 8, 5 (1979), 244--251.Google ScholarGoogle Scholar
  10. Juan Besa and Yadran Eterovic. 2013. A concurrent red-black tree. J. Parallel Distrib. Comput. 73, 4 (2013), 434--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guy E Blelloch, Daniel Ferizovic, and Yihan Sun. 2016. Just Join for Parallel Ordered Sets. In Proc. of the ACM Symp. on Parallelism in Algorithms and Architectures (SPAA). 253--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guy E. Blelloch and Margaret Reid-Miller. 1998. Fast Set Operations Using Treaps. In Proc. ACM Symp. on Parallel Algorithms and Architectures (SPAA). 16--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nathan Grasso Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. 2010. A practical concurrent binary search tree. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 257--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Trevor Brown and Hillel Avni. 2012. Range Queries in Non-blocking k-ary Search Trees. Springer Berlin Heidelberg, Berlin, Heidelberg, 31--45.Google ScholarGoogle Scholar
  15. Trevor Brown, Faith Ellen, and Eric Ruppert. 2014. A general technique for non-blocking trees. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 329--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chee Yong Chan and Yannis E. Ioannidis. 1999. Hierarchical Prefix Cubes for Range-Sum Queries. In Proc. International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 675--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rosetta Code.-. Inverted index. (-). https://rosettacode.org/wiki/Inverted_indexGoogle ScholarGoogle Scholar
  18. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2001. Introduction to Algorithms (second edition). MIT Press and McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Costin S. 2012. C# based interval tree. https://code.google.com/archive/p/intervaltree. (2012).Google ScholarGoogle Scholar
  20. Bolin Ding and Arnd Christian König. 2011. Fast set intersection in memory. Proc. of the VLDB Endowment 4, 4 (2011), 255--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dana Drachsler, Martin T. Vechev, and Eran Yahav. 2014. Practical concurrent binary search trees via logical ordering. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 343--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. James R Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. 1986. Making data structures persistent. In Proc. ACM Symp. on Theory of computing (STOC). 109--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Herbert Edelsbrunner. 1980. Dynamic Rectangle Intersrction Searching. Technical Report Institute for Technical Processing Report 47. Technical University of Graz, Austria.Google ScholarGoogle Scholar
  24. Stephan Erb, Moritz Kobitzsch, and Peter Sanders. 2014. Parallel bi-objective shortest paths using weight-balanced b-trees with bulk updates. In Experimental Algorithms. Springer, 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Preparata Franco and Michael Ian Preparata Shamos. 1985. Computational geometry: an introduction. Springer Science & Business Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Leonor Frias and Johannes Singler. 2007. Parallelization of Bulk Operations for STL Dictionaries. In Euro-Par 2007 Workshops: Parallel Processing, HPPC 2007, UNICORE Summit 2007, and VHPC 2007. 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hong Gao and Jian-Zhong Li. 2005. Parallel Data Cube Storage Structure for Range Sum Queries and Dynamic Updates. J. Comput. Sci. Technol. 20, 3 (May 2005), 345--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 1 (01 Mar 1997), 29--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Antonin Guttman. 1984. R-trees: A Dynamic Index Structure for Spatial Searching. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, USA, 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chaim-Leib Halbet and Konstantin Tretyakov. 2015. Python based interval tree. https://github.com/chaimleib/intervaltree. (2015).Google ScholarGoogle Scholar
  31. Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, and Ramakrishnan Srikant. 1997. Range Queries in OLAP Data Cubes. In Proc. ACM International Conference on Management of Data (SIGMOD). ACM, New York, NY, USA, 73--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Paul Hudak. 1986. A Semantic Model of Reference Counting and Its Abstraction (Detailed Summary). In Proc. of the 1986 ACM Conference on LISP and Functional Programming (LFP '86). 351--363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hiroshi Inoue, Moriyoshi Ohara, and Kenjiro Taura. 2014. Faster set intersection with SIMD instructions by reducing branch mispredictions. Proc. of the VLDB Endowment 8, 3 (2014), 293--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jinwoong Kim, Sul-Gi Kim, and Beomseok Nam. 2013. Parallel multi-dimensional range query processing with R-trees on GPU. J. Parallel and Distrib. Comput. 73, 8 (2013), 1195 -- 1207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hans-Peter Kriegel, Marco Pötke, and Thomas Seidl. 2000. Managing Intervals Efficiently in Object-Relational Databases. In Proc. International Conference on Very Large Data Bases (VLDB). 407--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. T. Kung and Philip L. Lehman. 1980. Concurrent Manipulation of Binary Search Trees. ACM Trans. Database Syst. 5, 3 (1980), 354--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kim S. Larsen. 2000. AVL Trees with Relaxed Balance. J. Comput. Syst. Sci. 61, 3 (2000), 508--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Charles E. Leiserson. 2010. The Cilk++ concurrency platform. The Journal of Supercomputing 51, 3 (2010), 244--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In Proc. IEEE International Conference on Data Engineering (ICDE). 302--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. LevelDB. -. (-). leveldb.orgGoogle ScholarGoogle Scholar
  42. Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proc. of the 7th ACM European Conference on Computer Systems. ACM, 183--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mark De Berg Mark, Mark Van Kreveld, Mark Overmars, and Otfried Cheong Schwarzkopf. 2000. Computational Geometry. Springer.Google ScholarGoogle Scholar
  44. Edward M. McCreight. 1985. Priority Search Trees. SIAM J. Comput. 14, 2 (1985), 257--276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. David R Musser, Gillmer J Derge, and Atul Saini. 2009. STL tutorial and reference guide: C+ + programming with the standard template library. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Aravind Natarajan and Neeraj Mittal. 2014. Fast concurrent lock-free binary search trees. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPoPP). 317--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Gabriele Neyer. 2017. dD Range and Segment Trees. In CGAL User and Reference Manual (4.10 ed.). CGAL Editorial Board. http://doc.cgal.org/4.10/Manual/packages.html#PkgRangeSegmentTreesDSummaryGoogle ScholarGoogle Scholar
  48. Jürg Nievergelt and Edward M. Reingold. 1973. Binary Search Trees of Bounded Balance. SIAM J. Comput. 2, 1 (1973), 33--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chris Okasaki. 1999. Purely functional data structures. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Oracle. -. Oracle NoSQL. (-). https://www.oracle.com/database/nosql/index.htmlGoogle ScholarGoogle Scholar
  51. Mark H Overmars. 1996. Designing the computational geometry algorithms library CGAL. In Applied computational geometry towards geometric engineering. Springer, 53--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Heejin Park and Kunsoo Park. 2001. Parallel algorithms for red-black trees. Theoretical Computer Science 262, 1 (2001), 415--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wolfgang J. Paul, Uzi Vishkin, and Hubert Wagener. 1983. Parallel Dictionaries in 2-3 Trees. In Proc. International Colloq. on Automata, Languages and Programming (ICALP). 597--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Aleksandar Prokopec, Nathan Grasso Bronson, Phil Bagwell, and Martin Odersky. 2012. Concurrent Tries with Efficient Non-blocking Snapshots. In Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Anand Rajaraman and Jeffrey David Ullman. 2011. Mining of Massive Datasets:. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. RocksDB. -. (-). rockdb.orgGoogle ScholarGoogle Scholar
  58. HananSamet. 1990. The design and analysis of spatial data structures. Vol. 199. Addison-Wesley Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Raimund Seidel and Celcilia R. Aragon. 1996. Randomized Search Trees. Algorithmica 16 (1996), 464--497.Google ScholarGoogle ScholarCross RefCross Ref
  60. Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, and Himani Apte. 2013. F1: A Distributed SQL Database That Scales. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1068--1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Johannes Singler, Peter Sanders, and Felix Putze. 2007. MCSTL: The Multi-core Standard Template Library. In Proc. European Conf. on Parallel Processing (Euro-Par). 682--694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. TBB -. Intel® Threading Building Blocks 2.0 for Open Source. (-). https://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  63. Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. {n. d.}. Building A Bw-Tree Takes More Than Just Buzz Words {Experiments and Analyses} (Unpublished). ({n. d.}).Google ScholarGoogle Scholar
  64. Wikimedia Foundation. 2016. Wikepedia:Database download. https://en.wikipedia.org/wiki/Wikipedia:Database_download. (2016).Google ScholarGoogle Scholar
  65. Huanchen Zhang, David G Andersen, Andrew Pavlo, Michael Kaminsky, Lin Ma, and Rui Shen. 2016. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes. In Proc. of the 2016 International Conference on Management of Data. ACM, 1567--1581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Justin Zobel and Alistair Moffat. 2006. Inverted Files for Text Search Engines. ACM Comput. Surv. 38, 2, Article 6 (July 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PAM: parallel augmented maps

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
        February 2018
        442 pages
        ISBN:9781450349826
        DOI:10.1145/3178487
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 53, Issue 1
          PPoPP '18
          January 2018
          426 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3200691
          Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 February 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate230of1,014submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader