research-article

Parallelizing Data Processing on FPGAs with Shifter Lists

Authors:
Louis Woods

Systems Group, Department of Computer Science, ETH Zürich, Switzerland

Systems Group, Department of Computer Science, ETH Zürich, Switzerland
View Profile

,
Gustavo Alonso

Systems Group, Department of Computer Science, ETH Zürich, Switzerland

Systems Group, Department of Computer Science, ETH Zürich, Switzerland
View Profile

,
Jens Teubner

DBIS Group, Department of Computer Science, TU Dortmund University, Germany

DBIS Group, Department of Computer Science, TU Dortmund University, Germany
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 8 Issue 2Article No.: 7pp 1–22https://doi.org/10.1145/2629551

Published:31 March 2015Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Parallelism is currently seen as a mechanism to minimize the impact of the power and heat dissipation problems encountered in modern hardware. Data parallelism—based on partitioning the data—and pipeline parallelism—based on partitioning the computation—are the two main approaches to leverage parallelism on a wide range of hardware platforms.

Unfortunately, not all data processing problems are susceptible to either of those strategies. An example is the skyline operator [Börzsönyi et al. 2001], which computes the set of Pareto-optimal points within a multidimensional dataset. Existing approaches to parallelize the skyline operator are based on data parallelism. As a result, they suffer from a high overhead when merging intermediate results because of the lack of a global view of the problem inherent to partitioning the input data.

In this article, we show how to combine pipeline with data parallelism on a Field-Programmable Gate Array (FPGA) for a more efficient utilization of the available hardware parallelism. As we show in our experiments, skyline computation using our proposed technique scales linearly with the number of processing elements, and the performance we achieve on a rather small FPGA is comparable to that of a 64-core high-end server running a state-of-the-art data parallel implementation of skyline [Park et al. 2009].

The proposed approach to parallelize the skyline operator can be generalized to a wider range of data processing problems. We demonstrate this through a novel, highly parallel data structure, a shifter list, that can be efficiently implemented on an FPGA. The resulting template is easy to parametrize to implement a variety of computationally intensive operators such as frequent items, n-closest pairs, or K-means.

References

Ray Bittner. 2009. The speedy DDR2 controller for FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA’09).Google Scholar
Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5 (May 2011). Google ScholarDigital Library
Stephan Börzsönyi, Donald Kossmann, and Konrad Stocker. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering (ICDE’01). Google ScholarDigital Library
Sung-Ryoung Cho, Jongwuk Lee, Seung-Won Hwang, Hwansoo Han, and Sang-Won Lee. 2010. Vskyline: Vectorization for efficient skyline computation. SIGMOD Rec. 39, 2 (Dec. 2010). Google ScholarDigital Library
Eric S. Chung, James C. Hoe, and Ken Mai. 2011. Coram: An in-fabric memory architecture for FPGA-based computing. In Proceedings of the 19th ACM SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11). Google ScholarDigital Library
Convey Computer. 2014. Convey HC-2. Retrieved from http://www.conveycomputer.com.Google Scholar
Christopher Dennl, Daniel Ziener, and Jürgen Teich. 2012. On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library. In Proceedings of the 20th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’12). Google ScholarDigital Library
Petros Drineas, Alan M. Frieze, Ravi Kannan, Santosh S. Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 1--3 (June 2004). Google ScholarDigital Library
Ken Eguro. 2010. SIRC: An extensible reconfigurable computing communication API. In Proceedings of the 18th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’10). Google ScholarDigital Library
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Symposium on Computer Architecture (ISCA’11). Google ScholarDigital Library
Parke Godfrey, Ryan Shipley, and Jarek Gryz. 2005. Maximal vector computation in large data sets. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB’05). Google ScholarDigital Library
Amir Hormati, Manjunath Kudlur, Scott Mahlke, David Bacon, and Rodric Rabbah. 2008. Optimus: Efficient realization of streaming applications on FPGAs. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’08). Google ScholarDigital Library
IBM. 2014. IBM Netezza Data Warehouse Appliances. Retrieved from http://www.ibm.com/software/data/netezza.Google Scholar
Hiroaki Inoue, Takashi Takenaka, and Masato Motomura. 2011. 20Gbps C-based complex event processing. In Proceedings of the 21st International. Conference on Field Programmable Logic and Applications (FPL’11). Google ScholarDigital Library
Gilles Kahn. 1974. The semantics of simple language for parallel programming. In IFIP Congress.Google Scholar
Dirk Koch and Jim Torresen. 2011. FPGASort: A high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the 19th ACM SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’11). Google ScholarDigital Library
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2006. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. (TODS) 31, 3 (Sept. 2006). Google ScholarDigital Library
Roger Moussalli, Mariam Salloum, Walid A. Najjar, and Vassilis J. Tsotras. 2011. Massively parallel XML twig filtering using dynamic programming on FPGAs. In Proceedings of the 27th International Conference on Data Engineering (ICDE’11). Google ScholarDigital Library
Sungwoo Park, Taekyung Kim, Jonghyun Park, Jinha Kim, and Hyeonseung Im. 2009. Parallel skyline computation on multicore architectures. In Proceedings of the 25th International Conference on Data Engineering (ICDE’09). Google ScholarDigital Library
Parthasarathy Ranganathan. 2011. From microprocessors to nanostores: Rethinking data-centric systems. IEEE Comput. 44, 1 (Jan. 2011). Google ScholarDigital Library
Satnam Singh. 2011. Computing without processors. Commun. ACM 54, 8 (Aug. 2011). Google ScholarDigital Library
Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). Google ScholarDigital Library
Jens Teubner, René Müller, and Gustavo Alonso. 2010. FPGA acceleration for the frequent item problem. In Proceedings of the 26th International Conference on Data Engineering (ICDE’10).Google ScholarCross Ref
Riccardo Torlone and Paolo Ciaccia. 2002. Which are my preferred items. In Proceedings of the Workshop on Recommendation and Personalization in eCommerce (RPEC’02).Google Scholar

Index Terms

Parallelizing Data Processing on FPGAs with Shifter Lists
1. Information systems
  1. Data management systems

Recommendations

An FPGA implementation for neural networks with the FDFM processor core approach

This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Read More
Parallel Computation of Skyline Queries
FCCM '13: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines

Due to stagnant clock speeds and high power consumption of commodity microprocessors, database vendors have started to explore massively parallel co-processors such as FPGAs to further increase performance. A typical approach is to push simple but ...
Read More
Acceleration of Image Processing Algorithms Using Minimal Resources of Custom Reconfigurable Hardware
PCI '12: Proceedings of the 2012 16th Panhellenic Conference on Informatics

The hardware/software implementation of a custom vision board using minimal resources out of a reconfigurable platform is described. Demanding robotic vision applications in most cases require dedicated hardware for reliable operation. The designed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 8, Issue 2
Special Section on FPL 2013
April 2015
129 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/2746532
Editor:
Steve Wilton
Department of Electrical and Computer Engineering/University of British Columbia/Kaiser 4112, 5500-2332 Main Mall/Vancouver, BC V6T 1Z4 Canada
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2015
- Accepted: 1 April 2014
- Revised: 1 February 2014
- Received: 1 September 2013
Published in trets Volume 8, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
K-means
n-closest pairs
FPGA
database
frequent items
parallelism
pipeline
shifter list
skyline query
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 296
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallelizing Data Processing on FPGAs with Shifter Lists

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

An FPGA implementation for neural networks with the FDFM processor core approach

Parallel Computation of Skyline Queries

Acceleration of Image Processing Algorithms Using Minimal Resources of Custom Reconfigurable Hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Parallelizing Data Processing on FPGAs with Shifter Lists

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

An FPGA implementation for neural networks with the FDFM processor core approach

Parallel Computation of Skyline Queries

Acceleration of Image Processing Algorithms Using Minimal Resources of Custom Reconfigurable Hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media