research-article

FPGA-based Data Partitioning

Authors:
Kaan Kara

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland
View Profile

,
Jana Giceva

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland
View Profile

,
Gustavo Alonso

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland
View Profile

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataMay 2017Pages 433–445https://doi.org/10.1145/3035918.3035946

Published:09 May 2017Publication History

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 433–445

ABSTRACT

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in some cases up to 90% of the cost of, e.g., a parallel hash join. In this paper we explore the use of an FPGA to accelerate data partitioning. We do so in the context of new hybrid architectures where the FPGA is located as a co-processor residing on a socket and with coherent access to the same memory as the CPU residing on the other socket. Such an architecture reduces data transfer overheads between the CPU and the FPGA, enabling hybrid operator execution where the partitioning happens on the FPGA and the build and probe phases of a join happen on the CPU. Our experiments demonstrate that FPGA-based partitioning is significantly faster and more robust than CPU-based partitioning. The results open interesting options as FPGAs are gradually integrated tighter with the CPU.

References

I. Absalyamov, R. Halstead, P. Budhkar, W. Najjar, et al. Fpga-accelerated group-by aggregation using synchronizing caches. In ACM DaMoN, 2016. Google ScholarDigital Library
A. Appleby. https://github.com/aappleby/smhasher. January 2016.Google Scholar
C. Balkesen, G. Alonso, J. Teubner, and M. T. Özsu. Multi-core, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, 2013. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In IEEE ICDE, 2013. Google ScholarDigital Library
R. Barber, G. Lohman, I. Pandis, et al. Memory-efficient hash joins. PVLDB, 8(4):353--364, 2014. Google ScholarDigital Library
C. Barthels, S. Loesing, G. Alonso, and D. Kossmann. Rack-scale in-memory join processing using rdma. In ACM SIGMOD, 2015. Google ScholarDigital Library
C. Barthels, I. Müller, T. Schneider, G. Alonso, and T. Hoefler. Distributed join algorithms on thousands of cores. PVLDB, 10(5), 2017. Google ScholarDigital Library
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In ACM SIGMOD, 2011. Google ScholarDigital Library
J. Casper and K. Olukotun. Hardware acceleration of database operations. In ACM FPGA, 2014. Google ScholarDigital Library
R. Chen and V. K. Prasanna. Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform. IEEE FCCM, 2016.Google ScholarCross Ref
R. J. Halstead, I. Absalyamov, W. A. Najjar, and V. J. Tsotras. FPGA-based Multithreading for In-Memory Hash Joins. In CIDR, 2015.Google Scholar
R. J. Halstead, B. Sukhwani, H. Min, M. Thoennes, et al. Accelerating join operation for relational databases with FPGAs. In IEEE FCCM, 2013. Google ScholarDigital Library
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. PVLDB, 8(4):329--340, 2014. Google ScholarDigital Library
Z. Istvan, D. Sidler, and G. Alonso. Runtime parameterizable regular expression operators for databases. 2016.Google ScholarCross Ref
Z. Istvan, L. Woods, and G. Alonso. Histograms as a side effect of data movement for big data. In ACM SIGMOD, 2014. Google ScholarDigital Library
S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. PVLDB, 8(6):642--653, 2015. Google ScholarDigital Library
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In ACM DaMoN, 2012. Google ScholarDigital Library
K. Kara and G. Alonso. Fast and robust hashing for database operators. In IEEE FPL, 2016.Google ScholarCross Ref
C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, et al. Sort vs. hash revisited: fast join implementation on modern multi-core cpus. PVLDB, 2(2):1378--1389, 2009. Google ScholarDigital Library
H. Lang, V. Leis, M.-C. Albutiu, T. Neumann, and A. Kemper. Massively parallel numa-aware hash joins. In In Memory Data Management and Analysis. 2015.Google ScholarCross Ref
S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE TKDE, 14(4):709--730, 2002. Google ScholarDigital Library
N. Mirzadeh, O. Kocberber, B. Falsafi, and B. Grot. Sort vs. hash join revisited for near-memory execution. In ASBD, 2015.Google Scholar
R. Mueller, J. Teubner, and G. Alonso. Glacier: a query-to-hardware compiler. In ACM SIGMOD, 2010. Google ScholarDigital Library
N. Oliver et al. A reconfigurable computing system based on a cache-coherent fabric. In IEEE ReConFig, 2011. Google ScholarDigital Library
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In IEEE ICDE, 2014.Google ScholarCross Ref
O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking SIMD vectorization for in-memory databases. In ACM SIGMOD, 2015. Google ScholarDigital Library
O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In ACM SIGMOD, 2014. Google ScholarDigital Library
A. Putnam, A. M. Caulfield, E. S. Chung, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In ACM/IEEE ISCA, 2014. Google ScholarDigital Library
S. Richter, V. Alvarez, and J. Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3):96--107, 2015. Google ScholarDigital Library
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, et al. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In ACM SIGMOD, 2010. Google ScholarDigital Library
S. Schuh, X. Chen, and J. Dittrich. An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory. In ACM SIGMOD, 2016. Google ScholarDigital Library
F. M. Schuhknecht, P. Khanchandani, and J. Dittrich. On the surprising difficulty of simple things: the case of radix partitioning. PVLDB, 8(9):934--937, 2015. Google ScholarDigital Library
D. Sidler, Z. Istvan, M. Owaida, and G. Alonso. Accelerating pattern matching queries in hybrid cpu-fpga architectures. In ACM SIGMOD, 2017. Google ScholarDigital Library
J. Stuecheli, B. Blaner, C. Johns, and M. Siegel. Capi: A coherent accelerator processor interface. IBM Journal of Research and Development, 59(1):7--1, 2015.Google ScholarDigital Library
B. Sukhwani, H. Min, M. Thoennes, et al. Database analytics acceleration using fpgas. In ACM PACT, 2012. Google ScholarDigital Library
T. Ueda, M. Ito, and M. Ohara. A Dynamically Reconfigurable Equi-Joiner on FPGA. IBM Tehnical Report RT0969, 2015.Google Scholar
Z. Wang, B. He, and W. Zhang. A study of data partitioning on OpenCL-based FPGAs. In IEEE FPL, 2015.Google ScholarCross Ref
J. Wassenberg and P. Sanders. Engineering a multi-core radix sort. In Euro-Par, 2011. Google ScholarDigital Library
S. Werner, S. Groppe, V. Linnemann, and T. Pionteck. Hardware-accelerated join processing in large Semantic Web databases with FPGAs. In IEEE HPCS, 2013.Google ScholarCross Ref
L. Woods, G. Alonso, and J. Teubner. Parallel computation of skyline queries. In IEEE FCCM, 2013. Google ScholarDigital Library
L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In ACM SIGARCH, 2013. Google ScholarDigital Library

Index Terms

FPGA-based Data Partitioning
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Join algorithms

Recommendations

SAccO

This paper presents SAccO (Scalable Accelerator platform Osnabrück), a novel framework for implementing data-intensive applications using scalable and portable reconfigurable hardware accelerators. Instead of using expensive "reconfigurable ...
Read More
Hardware accelerated FPGA placement

A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...
Read More
Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA
Special Issue on Deep learning on FPGAs

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data partitioning
fpga
hardware acceleration
hash join
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 1,177
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FPGA-based Data Partitioning

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

SAccO

Hardware accelerated FPGA placement

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA