skip to main content
10.1145/1229428.1229446acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Automatic mapping of nested loops to FPGAS

Published: 14 March 2007 Publication History

Abstract

This paper present a framework for automatic mapping of perfectly nested loops with constant dependences onto regular processor arrays, suitable for direct implementation on Field Programmable Gate Arrays (FPGAs). The problem is modeled as that of finding a suitable completion procedure for a full-rank linear transformation on the iteration space. The approach enables extraction of necessary degrees of communication-free and pipelined parallelism to optimize performance under the resource constraints of limited logic resources and I/O bandwidth available on an FPGA. The generation of control signals for the custom processing elements is also addressed. Examples of automatic derivation of parallel designs for some common nested loops are provided. Experimental results on the Cray XD1 show that an FPGA-based matrix-multiplication design obtained using the framework attains significant speedup on the XD1's attached FPGA, when compared to execution on the XD1 CPU.

References

[1]
PolyLib - A library of polyhedral functions. http://icps.u-strasbg.fr/polylib/.
[2]
C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPoPP'91, pages 39--50, 1991.
[3]
R. Bagnara, P. Hill, and E. Zaffanella. PPL: The Parma Polyhedra Library. http://www.cs.unipr.it/ppl/.
[4]
M. Bednara and J. Teich. Automatic synthesis of FPGA processor arrays from loop algorithms. Journal of Supercomputing, 26(2):149--165, Sept. 2003.
[5]
U. Bondhugula, A. Devulapalli, J. Dinan, J. Fernando, P. Wyckoff, E. Stahlberg, and P. Sadayappan. Hardware/software integration for FPGA-based all pairs shortest-paths. In IEEE FCCM'06, Apr. 2006.
[6]
U. Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff, and P. Sadayappan. Parallel FPGA-based all-pairs shortest-paths in a directed graph. In IPDPS'06, Apr. 2006.
[7]
ClearSpeed Inc. ClearSpeed CSX 600, 2004. http://www.clearspeed.com/.
[8]
Cray Inc. Cray XD1 whitepaper, 2005. http://www.cray.com/products/xd1/.
[9]
A. Darte, S. Derrien, and T. Risset. Hardware/Software Interface for Multi-Dimensional Processor Arrays. In IEEE ASAP, pages 28--35, 2005.
[10]
A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. A Constructive Solution to the Juggling Problem in Processor Array Synthesis. In IPDPS'00, pages 815--822, 2000.
[11]
Y. Dou, S. Vassiliadis, G. K. Kuzmanov, and G. N. Gaydadjiev. 64-bit floating-point FPGA matrix multiplication. In ACM FPGA'05, pages 86--95, 2005.
[12]
P. Feautrier. Some efficient solutions to the affine scheduling problem. part II. multidimensional time. International Journal of Parallel Programming, 21(6):389--420, 1992.
[13]
M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-Oriented FPGA computing in the Streams-C high level language. In FCCM '00, pages 49--56, 2000.
[14]
A.-C. Guillou, P. Quinton, and T. Risset. Hardware synthesis for multi-dimensional time. IEEE ASAP'03, 00:40, 2003.
[15]
Z. Guo, B. Buyukkurt, and W. Najjar. Input data reuse in compiling window operations onto reconfigurable hardware. In LCTES '04, pages 249--256, 2004.
[16]
IRISA COSI. The MMAlpha environment. http://www.irisa.fr/cosi/ALPHA/mmalpha_english.html.
[17]
B. Kienhuis, E. Rijpkema, and E. Deprettere. Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In 8th International workshop on Hardware/software codesign, pages 13--17, 2000.
[18]
W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming, 2 (2):183--205, 1994.
[19]
A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS'99, pages 228--237, 1999.
[20]
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445--475, 1998.
[21]
D. I. Moldovan and J. A. B. Fortes. Partitioning and mapping algorithms into fixed size systolic arrays. IEEE Transactions on Computers, 35(1):1--12, 1986.
[22]
P. Quinton. Automatic synthesis of systolic Arrays from uniform recurrent equations. In ISCA, pages 208--214, 1984.
[23]
R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. R. Rau, D. Cronquist, and M. Sivaraman. PICO-NPA: High-Level synthesis of non-programmable hardware accelerators. J. VLSI Signal Process. Syst., 31(2):127--142, 2002.
[24]
A. Schrijver. Theory of Linear and Integer Programming. Wiley, 1987. SchRI a 87:1 1.Ex.
[25]
B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In PLDI'02, pages 165--176, 2002.
[26]
J. Teich, L. Thiele, and L. Zhang. Partitioning processor arrays under resource constraints. Int. Journal on VLSI and Signal Processing Systems, 17. (1):5--20, 1997.
[27]
K. Underwood. FPGAs vs. CPUs: Trends in peak floating-point performance. In ACM FPGA'04, 2004.
[28]
K. D. Underwood and K. S. Hemmert. Closing the Gap: CPU and FPGA Trends in Sustainable Floating-Point BLAS Performance. In IEEE FCCM'04, Apr. 2004.
[29]
Xilinx Inc. Virtex-II Pro and Virtex-II Pro X Platform FPGAs: complete data sheet. Xilinx Inc., 2005.
[30]
K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. A. Padua, K. Pingali, P. Stodghill, and P. Wu. A comparison of empirical and model driven optimization. In PLDI'03, pages 63--76, 2003.
[31]
L. Zhuo and V. K. Prasanna. High performance linear algebra on a reconfigurable supercomputer. In Supercomputing, Nov. 2005.

Cited By

View all
  • (2023)Auto-DOK: Compiler-Assisted Automatic Detection of Offload Kernels for FPGA-HBM Architectures2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00085(577-584)Online publication date: 6-Sep-2023
  • (2022)HeteroFlowProceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3490422.3502369(78-88)Online publication date: 13-Feb-2022
  • (2022)HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical AbstractionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313255141:10(3290-3303)Online publication date: Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
March 2007
284 pages
ISBN:9781595936028
DOI:10.1145/1229428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. FPGA compilation
  3. control signals
  4. linear transformation
  5. nested loops
  6. regular processor arrays
  7. resource constraints
  8. scheduling

Qualifiers

  • Article

Conference

PPoPP07
Sponsor:

Acceptance Rates

PPoPP '07 Paper Acceptance Rate 22 of 65 submissions, 34%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Auto-DOK: Compiler-Assisted Automatic Detection of Offload Kernels for FPGA-HBM Architectures2023 26th Euromicro Conference on Digital System Design (DSD)10.1109/DSD60849.2023.00085(577-584)Online publication date: 6-Sep-2023
  • (2022)HeteroFlowProceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3490422.3502369(78-88)Online publication date: 13-Feb-2022
  • (2022)HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical AbstractionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.313255141:10(3290-3303)Online publication date: Oct-2022
  • (2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
  • (2021)AutoSAThe 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3431920.3439292(93-104)Online publication date: 17-Feb-2021
  • (2021)Polyhedral-based Pipelining of Imperfectly-Nested Loop for CGRAs2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643542(1-9)Online publication date: 1-Nov-2021
  • (2020)SuSyProceedings of the 39th International Conference on Computer-Aided Design10.1145/3400302.3415644(1-9)Online publication date: 2-Nov-2020
  • (2016)A DSL Compiler for Accelerating Image Processing Pipelines on FPGAsProceedings of the 2016 International Conference on Parallel Architectures and Compilation10.1145/2967938.2967969(327-338)Online publication date: 11-Sep-2016
  • (2015)Processor arrays generation for matrix algorithms used in embedded platforms implemented on FPGAsMicroprocessors & Microsystems10.1016/j.micpro.2014.12.00339:7(576-588)Online publication date: 1-Oct-2015
  • (2014)Symbolic inner loop parallelisation for massively parallel processor arraysProceedings of the Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign10.1109/MEMCOD.2014.6961865(219-228)Online publication date: 1-Oct-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media