skip to main content
research-article
Public Access

Distributed Joins and Data Placement for Minimal Network Traffic

Published:16 November 2018Publication History
Skip Abstract Section

Abstract

Network communication is the slowest component of many operators in distributed parallel databases deployed for large-scale analytics. Whereas considerable work has focused on speeding up databases on modern hardware, communication reduction has received less attention. Existing parallel DBMSs rely on algorithms designed for disks with minor modifications for networks. A more complicated algorithm may burden the CPUs but could avoid redundant transfers of tuples across the network. We introduce track join, a new distributed join algorithm that minimizes network traffic by generating an optimal transfer schedule for each distinct join key. Track join extends the trade-off options between CPU and network. Track join explicitly detects and exploits locality, also allowing for advanced placement of tuples beyond hash partitioning on a single attribute. We propose a novel data placement algorithm based on track join that minimizes the total network cost of multiple joins across different dimensions in an analytical workload. Our evaluation shows that track join outperforms hash join on the most expensive queries of real workloads regarding both network traffic and execution time. Finally, we show that our data placement optimization approach is both robust and effective in minimizing the total network cost of joins in analytical workloads.

References

  1. Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In Proceedings of the EDBT. 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5, 10 (June 2012), 1064--1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Ozsu. 2013. MultiCore, main-memory joins: Sort vs. hash revisited. PVLDB 7, 1 (Sept. 2013), 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Philip A. Bernstein and D. W. Chiu. 1981. Using semi-joins to solve relational queries. J. ACM 28, 1 (Jan. 1981). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Philip A. Bernstein, Nathan Goodman, Eugene Wong, Christopher L. Reeve, and James B. Rothnie. 1981. Query processing in a system for distributed databases. ACM Trans. Database Syst. 6, 4 (Dec. 1981), 602--625. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Comm. ACM 13, 7 (July 1970). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Surajit Chaudhuri and Vivek Narasayya. 2016. TPC-H Data Generation with Skew. Retrieved from https://www.microsoft.com/en-us/download/details.aspx?id=52430.Google ScholarGoogle Scholar
  8. Shumo Chu, Magdalena Balazinska, and Dan Suciu. 2015. From theory to practice: Efficient join query evaluation in a parallel database system. In Proceedings of the SIGMOD. 63--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Transaction Processing Performance Council. 2014. The TPC-H Benchmark, 2.17.1. Retrieved from http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  10. Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: A workload-driven approach to database replication and partitioning. PVLDB 3, 1--2 (Sept. 2010), 48--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the OSDI. 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, and Rick Rasmussen. 1990. The gamma database machine project. IEEE Trans. Knowl. Data Engin. 2, 1 (Mar. 1990), 44--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David J. DeWitt and Jim Gray. 1992. Parallel database systems: The future of high performance database systems. Comm. ACM 35 (1992), 85--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael R. Stonebraker, and David A. Wood. 1984. Implementation techniques for main memory database systems. In Proceedings of the SIGMOD. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jennie Duggan, Olga Papaemmanouil, Leilani Battle, and Michael Stonebraker. 2015. Skew-aware join optimization for array databases. In Proceedings of the SIGMOD. 123--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. George Eadon and others. 2008. Supporting table partitioning by reference in oracle. In Proceedings of the SIGMOD. 1111--1122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mohammed Elseidy, Abdallah Elguindy, Aleksandar Vitorovic, and Christoph Koch. 2014. Scalable and adaptive online joins. PVLDB 7, 6 (Feb. 2014), 441--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson. 2011. CoHadoop: Flexible data placement and its exploitation in Hadoop. PVLDB 4, 9 (June 2011), 575--585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert Epstein, Michael Stonebraker, and Eugene Wong. 1978. Distributed query processing in a relational data base system. In Proceedings of the SIGMOD. 169--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Philip W. Frey, Romulo Goncalves, Martin Kersten, and Jens Teubner. 2009. Spinning relations: High-speed networks for distributed join processing. In Proceedings of the DaMoN. 27--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lukasz Golab, Marios Hadjieleftheriou, Howard Karloff, and Barna Saha. 2014. Distributed data placement to minimize communication costs via graph partitioning. In Proceedings of the SSDBM. Article 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Martin Grund and others. 2010. HYRISE: A main memory hybrid storage engine. PVLDB 4, 2 (Nov. 2010), 105--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ryan Johnson, Ippokratis Pandis, Nikos Hardavellas, Anastasia Ailamaki, and Babak Falsafi. 2009. Shore-MT: A scalable storage manager for the multicore era. In Proceedings of the EDBT. 24--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Changkyu Kim and others. 2009. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB 2, 2 (Aug. 2009), 1378--1389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Changkyu Kim, Jongsoo Park, Nadathur Satish, Hongrae Lee, Pradeep Dubey, and Jatin Chhugani. 2012. CloudRAMsort: Fast and efficient large-scale distributed RAM sort on shared-nothing cluster. In Proceedings of the SIGMOD. 841--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Masaru Kitsuregawa, Hidehiko Tanaka, and Tohru Moto-Oka. 1983. Application of hash to data base machine and its architecture. New Gen. Comput. 1 (1983), 63--74.Google ScholarGoogle ScholarCross RefCross Ref
  27. K. Ashwin Kumar, Amol Deshpande, and Samir Khuller. 2013. Data placement and replica selection for improving co-location in distributed environments. CoRR abs/1302.4168 (2013). Retrieved from http://arxiv.org/abs/1302.4168.Google ScholarGoogle Scholar
  28. Viktor Leis et al. 2014. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In Proceedings of the SIGMOD. 743--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daniel Lemire et al. 2015. Decoding billions of integers per second through vectorization. Softw.: Pract. Exper. 45, 1 (Jan. 2015), 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhe Li and Kenneth A. Ross. 1995. PERF join: An alternative to two-way semijoin and Bloomjoin. In Proceedings of the CIKM. 137--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lothar F. Mackert and Guy M. Lohman. 1986. R<sup>*</sup> optimizer validation and performance evaluation for distributed queries. In Proceedings of the VLDB. 149--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. 2000. Optimizing database architecture for the new bottleneck: Memory access. VLDB J. 9, 3 (2000), 231--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Manish Mehta and David J. DeWitt. 1997. Data placement in shared-nothing parallel database systems. VLDB J. 6, 1 (Feb. 1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ingo Müller et al. 2015. Cache-efficient aggregation: Hashing is sorting. In Proceedings of the SIGMOD. 1123--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. K. Mullin. 1990. Optimal semijoins for distributed database systems. IEEE Trans. Softw. Eng. 16, 5 (May 1990). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rimma Nehme and Nicolas Bruno. 2011. Automated partitioning design in parallel database systems. In Proceedings of the SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alper Okcan and Mirek Riedewald. 2011. Processing theta-joins using mapreduce. In Proceedings of the SIGMOD. 949--960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In Proceedings of the SIGMOD. 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the SIGMOD. 165--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo—A vector algebra for portable database performance on modern hardware. PVLDB 9, 14 (Oct. 2016), 1707--1718. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD vectorization for in-memory databases. In Proceedings of the SIGMOD. 1493--1508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Orestis Polychroniou and Kenneth A. Ross. 2014. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In Proceedings of the SIGMOD. 755--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Orestis Polychroniou and Kenneth A. Ross. 2015. Efficient lightweight compression alongside fast scans. In Proceedings of the DaMoN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Orestis Polychroniou, Rajkumar Sen, and Kenneth A. Ross. 2014. Track join: Distributed joins with minimal network traffic. In Proceedings of the SIGMOD. ACM, 1483--1494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Abdul Quamar, K. Ashwin Kumar, and Amol Deshpande. 2013. SWORD: Scalable workload-aware data placement for transactional workloads. In Proceedings of the EDBT. 430--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vijayshankar Raman et al. 2013. DB2 with BLU acceleration: So much more than just a column store. PVLDB 6, 11 (Aug. 2013), 1080--1091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy Lohman. 2002. Automating physical database design in a parallel database. In Proceedings of the SIGMOD. 558--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wolf Rödiger, Sam Idicula, Alfons Kemper, and Thomas Neumann. 2016. Flow-join: Adaptive skew handling for distributed joins over high-speed networks. In Proceedings of the ICDE. 1194--1205.Google ScholarGoogle ScholarCross RefCross Ref
  49. Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner, Angelika Reiser, Alfons Kemper, and Thomas Neumann. 2014. Locality-sensitive operators for parallel main-memory database clusters. In Proceedings of the ICDE. 17--30.Google ScholarGoogle ScholarCross RefCross Ref
  50. N. Roussopoulos and H. Kang. 1991. A pipeline N-way join algorithm based on the 2-way semijoin program. IEEE Trans. Knowl. Data Engin. 3, 4 (Dec. 1991), 486--495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nadathur Satish, Changkyu Kim, Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, Daehyun Kim, and Pradeep Dubey. 2010. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In Proceedings of the SIGMOD. 351--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Donovan A. Schneider and David J. DeWitt. 1989. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In Proceedings of the SIGMOD. 110--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. James W. Stamos and Honesty C. Young. 1993. A symmetric fragment and replicate algorithm for distributed joins. IEEE Trans. Parallel Distrib. Syst. 4, 12 (Dec. 1993), 1345--1354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Thomas Stöhr et al. 2000. Multi-dimensional database allocation for parallel data warehouses. In Proceedings of the VLDB. 273--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mike Stonebraker et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the VLDB. 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Michael Stonebraker et al. 2007. The end of an architectural era: (It’s time for a complete rewrite). In Proceedings of the VLDB. 1150--1160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Rebecca Taft et al. 2014. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. PVLDB 8, 3 (Nov. 2014), 245--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yuanyuan Tian, Fatma Özcan, Tao Zou, Romulo Goncalves, and Hamid Pirahesh. 2016. Building a hybrid warehouse: Efficient joins between data stored in HDFS and enterprise warehouse. ACM TODS 41, 4 (2016), 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Tolga Urhan and Michael J. Franklin. 2000. XJoin: A reactively scheduled pipelined join operator. IEEE Data Engin. Bulletin 23 (June 2000), 27--33.Google ScholarGoogle Scholar
  60. Jan Wassenberg and Peter Sanders. 2011. Engineering a multi core radix sort. In Proceedings of the EuroPar. 160--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB 2, 1 (Aug. 2009), 385--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2013. Shark: SQL and rich analytics at scale. In Proceedings of the SIGMOD. 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yang Ye, Kenneth A. Ross, and Norases Vesdapunt. 2011. Scalable aggregation on multicore processors. In Proceedings of the DaMoN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Feng Yu, Wen-Chi Hou, Cheng Luo, Dunren Che, and Mengxia Zhu. 2013. CS2: A new database synopsis for query estimation. In Proceedings of the SIGMOD. 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Erfan Zamanian, Carsten Binnig, and Abdallah Salama. 2015. Locality-aware partitioning in parallel database systems. In Proceedings of the SIGMOD. 17--30. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed Joins and Data Placement for Minimal Network Traffic

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 43, Issue 3
          Best of PODS 2017, Best of ICDT 2017 and Regular Papers
          September 2018
          164 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3284689
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 November 2018
          • Accepted: 1 July 2018
          • Revised: 1 June 2018
          • Received: 1 March 2017
          Published in tods Volume 43, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader