skip to main content
10.1145/3035918.3035946acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

FPGA-based Data Partitioning

Published:09 May 2017Publication History

ABSTRACT

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in some cases up to 90% of the cost of, e.g., a parallel hash join. In this paper we explore the use of an FPGA to accelerate data partitioning. We do so in the context of new hybrid architectures where the FPGA is located as a co-processor residing on a socket and with coherent access to the same memory as the CPU residing on the other socket. Such an architecture reduces data transfer overheads between the CPU and the FPGA, enabling hybrid operator execution where the partitioning happens on the FPGA and the build and probe phases of a join happen on the CPU. Our experiments demonstrate that FPGA-based partitioning is significantly faster and more robust than CPU-based partitioning. The results open interesting options as FPGAs are gradually integrated tighter with the CPU.

References

  1. I. Absalyamov, R. Halstead, P. Budhkar, W. Najjar, et al. Fpga-accelerated group-by aggregation using synchronizing caches. In ACM DaMoN, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Appleby. https://github.com/aappleby/smhasher. January 2016.Google ScholarGoogle Scholar
  3. C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In IEEE ICDE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Barber, G. Lohman, I. Pandis, et al. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using rdma. In ACM SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Barthels, I. Müller, T. Schneider, G. Alonso, and T. Hoefler. Distributed join algorithms on thousands of cores. PVLDB, 10(5), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In ACM SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Casper and K. Olukotun. Hardware acceleration of database operations. In ACM FPGA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Chen and V. K. Prasanna. Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform. IEEE FCCM, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. J. Halstead, I. Absalyamov, W. A. Najjar, and V. J. Tsotras. FPGA-based Multithreading for In-Memory Hash Joins. In CIDR, 2015.Google ScholarGoogle Scholar
  12. R. J. Halstead, B. Sukhwani, H. Min, M. Thoennes, et al. Accelerating join operation for relational databases with FPGAs. In IEEE FCCM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. PVLDB, 8(4):329--340, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Istvan, D. Sidler, and G. Alonso. Runtime parameterizable regular expression operators for databases. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  15. Z. Istvan, L. Woods, and G. Alonso. Histograms as a side effect of data movement for big data. In ACM SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. PVLDB, 8(6):642--653, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In ACM DaMoN, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Kara and G. Alonso. Fast and robust hashing for database operators. In IEEE FPL, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  19. C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, et al. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2(2):1378--1389, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Lang, V. Leis, M.-C. Albutiu, T. Neumann, and A. Kemper. Massively parallel numa-aware hash joins. In In Memory Data Management and Analysis. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, 14(4):709--730, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot. Sort vs. hash join revisited for near-memory execution. In ASBD, 2015.Google ScholarGoogle Scholar
  23. R. Mueller, J. Teubner, and G. Alonso. Glacier: a query-to-hardware compiler. In ACM SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Oliver et al. A reconfigurable computing system based on a cache-coherent fabric. In IEEE ReConFig, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In IEEE ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  26. O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking SIMD vectorization for in-memory databases. In ACM SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In ACM SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Putnam, A. M. Caulfield, E. S. Chung, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In ACM/IEEE ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Richter, V. Alvarez, and J. Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3):96--107, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, et al. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In ACM SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Schuh, X. Chen, and J. Dittrich. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In ACM SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. M. Schuhknecht, P. Khanchandani, and J. Dittrich. On the surprising difficulty of simple things: the case of radix partitioning. PVLDB, 8(9):934--937, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Sidler, Z. Istvan, M. Owaida, and G. Alonso. Accelerating pattern matching queries in hybrid cpu-fpga architectures. In ACM SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Stuecheli, B. Blaner, C. Johns, and M. Siegel. Capi: A coherent accelerator processor interface. IBM Journal of Research and Development, 59(1):7--1, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Sukhwani, H. Min, M. Thoennes, et al. Database analytics acceleration using fpgas. In ACM PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Ueda, M. Ito, and M. Ohara. A Dynamically Reconfigurable Equi-Joiner on FPGA. IBM Tehnical Report RT0969, 2015.Google ScholarGoogle Scholar
  37. Z. Wang, B. He, and W. Zhang. A study of data partitioning on OpenCL-based FPGAs. In IEEE FPL, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  38. J. Wassenberg and P. Sanders. Engineering a multi-core radix sort. In Euro-Par, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Werner, S. Groppe, V. Linnemann, and T. Pionteck. Hardware-accelerated join processing in large Semantic Web databases with FPGAs. In IEEE HPCS, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  40. L. Woods, G. Alonso, and J. Teubner. Parallel computation of skyline queries. In IEEE FCCM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In ACM SIGARCH, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FPGA-based Data Partitioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
        May 2017
        1810 pages
        ISBN:9781450341974
        DOI:10.1145/3035918

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 May 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader