research-article

Polyhedral-based data reuse optimization for configurable computing

Authors:
Louis-Noel Pouchet

University of California Los Angeles, Los Angeles, CA, USA

University of California Los Angeles, Los Angeles, CA, USA
View Profile

,
Peng Zhang

University of California Los Angeles, Los Angeles, CA, USA

University of California Los Angeles, Los Angeles, CA, USA
View Profile

,
P. Sadayappan

Ohio State University, Columbus, OH, USA

Ohio State University, Columbus, OH, USA
View Profile

,
Jason Cong

University of California Los Angeles, Los Angeles, CA, USA

University of California Los Angeles, Los Angeles, CA, USA
View Profile

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysFebruary 2013Pages 29–38https://doi.org/10.1145/2435264.2435273

Published:11 February 2013Publication History

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 29–38

ABSTRACT

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching.

We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

References

Center for domain-specific computing. http://cdsc.ucla.edu.Google Scholar
Convey. http://www.conveycomputer.com.Google Scholar
http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm.Google Scholar
Pocc 1.1. http://pocc.sourceforge.net.Google Scholar
An independent evaluation of the autoesl autopilot high-level synthesis tool. Technical report, Berkeley Design Technology, Inc., 2010.Google Scholar
N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000. Google ScholarDigital Library
C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for fpga. SIGPLAN Not., 47(8):285--286, Feb. 2012. Google ScholarDigital Library
J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002. Google ScholarDigital Library
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In ACM Symposium on Principles and practice of parallel programming, pages 1--10. ACM, 2008. Google ScholarDigital Library
C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Sept. 2004. Google ScholarDigital Library
S. Bayliss and G. A. Constantinides. Optimizing sdram bandwidth for custom fpga loop accelerators. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, pages 195--204, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008. Google ScholarDigital Library
E. Brockmeyer, M. Miranda, and F. Catthoor. Layer assignment techniques for low energy in multi-layered memory organisations. In Design, Automation and Test in Europe Conference and Exhibition, 2003, pages 1070--1075, 2003. DATE. Google ScholarDigital Library
F. Catthoor, K. Danckaert, K. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T. v. Achteren, and T. Omnes. Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarDigital Library
F. Catthoor, E. d. Greef, and S. Suytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA, USA, 1998. Google ScholarDigital Library
J. Cong, K. Guruaj, M. Huang, S. Li, B. Xiao, and Y. Zou. Domain-specific processor with 3d integration for medical image processing. In IEEE Intl. Conf. on Application-Specific Systems, Architectures and Processors, pages 247--250, sept. 2011. Google ScholarDigital Library
J. Cong, M. Huang, and Y. Zou. Accelerating fluid registration algorithm on multi-fpga platforms. In Proc. of Intl. Conf. on Field Programmable Logic and Applications (FPL'11). IEEE, 2011. Google ScholarDigital Library
J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-level synthesis for fpgas: From prototyping to deployment. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4):473--491, april 2011. Google ScholarDigital Library
J. Cong, P. Zhang, and Y. Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Design Automation Conference (DAC'12), June 2012. Google ScholarDigital Library
A. Darte, R. Schreiber, and G. Villard. Lattice-based memory allocation. IEEE Trans. Comput., 54(10):1242--1257, 2005. Google ScholarDigital Library
P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler. Bridging the gap between compilation and synthesis in the defacto system. In LCPC'03, pages 52--70. 2003. Google ScholarDigital Library
P. Feautrier. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program., 21(5):389--420, 1992. Google ScholarDigital Library
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006. Google ScholarDigital Library
A. Grosslinger. Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes. In Compiler Construction, pages 236--250, 2009. Google ScholarDigital Library
A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.Google Scholar
Q. Hu, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, and F. Catthoor. Incremental hierarchical memory size estimation for steering of loop transformations. ACM Trans. Des. Autom. Electron. Syst., 12, September 2007. Google ScholarDigital Library
F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988. Google ScholarDigital Library
I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. Drdu: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12, April 2007. Google ScholarDigital Library
M. Kandemir and A. Choudhary. Compiler-directed scratch pad memory hierarchy design and management. In Design Automation Conference, 2002. Proceedings. 39th, pages 628--633, 2002. Google ScholarDigital Library
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997. Google ScholarDigital Library
Q. Liu, G. A. Constantinides, K. Masselos, and P. Cheung. Combining data reuse with data-level parallelization for fpga-targeted hardware compilation: A geometric programming framework. Trans. Comp.-Aided Design of Integr. Circuits and Systems, 28(3):305--315, 2009. Google ScholarDigital Library
M. Palkovic, F. Catthoor, and H. Corporaal. Trade-offs in loop transformations. ACM Trans. Des. Autom. Electron. Syst., 14:22:1--22:30, April 2009. Google ScholarDigital Library
P. R. Panda, N. D. Dutt, and A. Nicolau. Local memory exploration and optimization in embedded systems. IEEE Trans. on CAD of Integrated Circuits and Systems, 18:3--13, January 1999. Google ScholarDigital Library
PolyOpt: A complete source-to-source Polyhedral Compiler, http://www.cse.ohio-state.edu/pouchet/polyopt.Google Scholar
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, 2007. Google ScholarDigital Library
B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in fpga-based systems. In Programming Language Design and Implementation, 2002. Google ScholarDigital Library
K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 327--337, 2009. Google ScholarDigital Library
S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software - ICMS 2010, pages 299--302, 2010. Google ScholarDigital Library
M. Wolf and M. Lam. A data locality optimizing algorithm. In ACM SIGPLAN'91 Conf. on Programming Language Design and Implementation, pages 30--44, New York, June 1991. Google ScholarDigital Library
M. Wolfe. Iteration space tiling for memory hierarchies. In 3rd SIAM Conf. on Parallel Processing for Scientific Computing, pages 357--361, Dec. 1987. Google ScholarDigital Library
W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations. In Proc. of the ACM/SIGDA Intl. Symp. on Field Programmable Gate Arrays (FPGA'13), 2013. Google ScholarDigital Library

Index Terms

Polyhedral-based data reuse optimization for configurable computing
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Datapath optimization
    2. Logic synthesis
      1. Circuit optimization
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

This paper presents initial work on developing a C compiler for the CoRAM FPGA computing abstraction. The presented effort focuses on compiling fixed-bound perfect loop nests that operate on large data sets in external DRAM. As required by the CoRAM ...
Read More
Efficient hardware code generation for FPGAs

The wider acceptance of FPGAs as a computing device requires a higher level of programming abstraction. ROCCC is an optimizing C to HDL compiler. We describe the code generation approach in ROCCC. The smart buffer is a component that reuses input data ...
Read More
Analyzing data reuse for cache reconfiguration

Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each nest might work best with a different cache configuration and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
February 2013
294 pages
ISBN:9781450318877
DOI:10.1145/2435264
General Chair:
Brad Hutchings
Brigham Young University, USA
,
Program Chair:
Vaughn Betz
University of Toronto, Canada
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 February 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compilation
data reuse
high-level synthesis
program transformations
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 126
  Total Citations
  View Citations
- 1,328
  Total Downloads
- Downloads (Last 12 months)109
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Polyhedral-based data reuse optimization for configurable computing

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Efficient hardware code generation for FPGAs

Analyzing data reuse for cache reconfiguration

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Polyhedral-based data reuse optimization for configurable computing

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Efficient hardware code generation for FPGAs

Analyzing data reuse for cache reconfiguration

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media