skip to main content
research-article

A compiler approach to managing storage and memory bandwidth in configurable architectures

Published: 03 October 2008 Publication History

Abstract

Configurable architectures offer the unique opportunity of realizing hardware designs tailored to the specific data and computational patterns of an application code. Customizing the storage structures is becoming increasingly important in mitigating the continuing gap between memory latencies and internal computing speeds. In this article we describe and evaluate a compiler algorithm that maps the arrays of a loop-based computation to internal storage structures, either RAM blocks or discrete registers. Our objective is to minimize the overall execution time while considering the capacity and bandwidth constraints of the storage resources. The novelty of our approach lies in creating a single framework that combines high-level compiler techniques with lower-level scheduling information for mapping the data. We illustrate the benefits of our approach for a set of image/signal processing kernels using a Xilinx Virtex™ Field-Programmable Gate Array (FPGA). Our algorithm leads to faster designs compared to the state-of-the-art custom data layout mapping technique, in some instances using less storage. When compared to hand-coded designs, our results are comparable in terms of execution time and resources, but are derived in a minute fraction of the design time.

References

[1]
Bairagi, D., Pande, S., and Agrawal, D. 2000. Framework for containing code size in limited register set embedded processors. In Proceedings of the ACM Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES'00). ACM Press.
[2]
Baradaran, N. 2007. Compiler directed data management for configurable architectures with heterogeneous memory structures. Ph.D. thesis, University of Southern California.
[3]
Baradaran, N. and Diniz, P. 2005. A register allocation algorithm in the presence of scalar replacement for fine-grain architecture. In Proceedings of the Conference on Design, Automation and Testing in Europe (DATE). 6--11.
[4]
Baradaran, N., Diniz, P., and Park, J. 2004. Extending the applicability of scalar replacement to multiple induction variables. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, Springer-Verlag, 455--469.
[5]
Barua, R., Lee, W., Amarasinghe, S., and Agarwal, A. 1999. Maps: A compiler-managed memory system for raw machines. In Proceedings of the International Symposium on Computer Architecture (ISCA). ACM Press, 4--15.
[6]
Catthoor, F., Danckaert, K., Kulkarni, K., Brockmeyer, E., Kjeldsberg, P., van Achteren, T., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic.
[7]
Compton, K. and Hauck, S. 2002. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv. 34, 2, 171--210.
[8]
Gokhale, M. and Stone, J. 1999. Automatic allocation of arrays to memories in FPGA processors with multiple memory banks. In Proceedings of the Symposium on FPGAs for Custom Computing Machines. IEEE Computer Society Press, 63--69.
[9]
Gong, W., Wang, G., and Kastner, R. 2005. Storage assignment during high-level synthesis for configurable architectures. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). IEEE Computer Society Press, Los Alamitos, CA.
[10]
Guo, Z., Buyukkurt, B., and Najjar, W. 2004. Input data reuse in compiling window operations onto reconfigurable hardware. In Proceedings of the Languages, Compilers and Tools for Embedded Systems (LCTES'04). ACM Press.
[11]
Hu, J., Kandemir, M., Vijaykrishnan, N., and Irwin, M. 2005. Analyzing data reuse for cache reconfiguration. ACM Trans. Embed. Comput. Syst. 4, 4, 851--876.
[12]
Kandemir, M. and Choudhary, A. 2002. Compiler-directed scratch pad memory hierarchy design and management. In Proceedings of the ACM/IEEE Design Automation Conference (DAC'02).
[13]
Kjeldsberg, P., Catthoor, F., and Aas, E. 2004. Storage requirement estimation for optimized design of data intensive applications. ACM Trans. Des. Autom. Electron. Syst. 9, 2, 133--158.
[14]
McKinley, K., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Prog. Lang. Syst. 4, 18, 424--453.
[15]
Mentor. 1999. Monet#8482; Behavioral Synthesis Manual. Mentor Graphics, Inc.
[16]
Panda, P., Dutt, N., and Nicolau, A. 2000. On-chip vs. off-chip memory: The data partitioning in embedded processor-based systems. ACM Trans. Des. Automa. Electron. Syst. 5, 3, 682--704.
[17]
So, B. and Hall, M. 2004. Increasing the applicability of scalar replacement. In Proceedings of the 13th International Conference on Compiler Construction (CC'04). Lecture Notes in Computer Science, Springer-Verlag, 185--201.
[18]
So, B., Hall, M., and Ziegler, H. 2004. Custom data layout for memory parallelism. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04).
[19]
Weinhardt, M. and Luk, W. 2001. Memory access optimization for reconfigurable systems. IEE Proc.-Comput. Digit. Tech. 148, 3, 105--112.
[20]
Wolf, M. and Lam, M. 1991. A data locality optimization algorithm. In Proceedings of the ACM Conference on Programming Language Design and Implementation (PLDI). ACM Press, 30--44.
[21]
Wuytack, S., Catthoor, F., de Jong, G., and Man, H. D. 1999. Minimizing the required memory bandwidth in VLSI system realizations. IEEE Trans. VLSI Syst. 7, 4, 433--441.

Cited By

View all
  • (2020)FPGA Memory Optimization in High-Level SynthesisFPGA Algorithms and Applications for the Internet of Things10.4018/978-1-5225-9806-0.ch003(51-81)Online publication date: 30-Mar-2020
  • (2017)COSMOSACM Transactions on Embedded Computing Systems10.1145/312656616:5s(1-22)Online publication date: 27-Sep-2017
  • (2017)System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-ChipIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.261150636:3(435-448)Online publication date: 1-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 13, Issue 4
September 2008
328 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/1391962
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 03 October 2008
Accepted: 01 April 2008
Revised: 01 February 2008
Received: 01 February 2007
Published in TODAES Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compiler analysis
  2. configurable architectures
  3. high-level hardware synthesis
  4. storage allocation and management

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)FPGA Memory Optimization in High-Level SynthesisFPGA Algorithms and Applications for the Internet of Things10.4018/978-1-5225-9806-0.ch003(51-81)Online publication date: 30-Mar-2020
  • (2017)COSMOSACM Transactions on Embedded Computing Systems10.1145/312656616:5s(1-22)Online publication date: 27-Sep-2017
  • (2017)System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-ChipIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.261150636:3(435-448)Online publication date: 1-Mar-2017
  • (2017)Broadening the exploration of the accelerator design space in embedded scalable platforms2017 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2017.8091091(1-7)Online publication date: Sep-2017
  • (2017)High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?IEEE Access10.1109/ACCESS.2016.26353785(8419-8432)Online publication date: 2017
  • (2016)Memory Partitioning in the LimitInternational Journal of Parallel Programming10.1007/s10766-015-0380-744:2(337-380)Online publication date: 1-Apr-2016
  • (2015)Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory SystemACM Transactions on Reconfigurable Technology and Systems10.1145/28007888:4(1-24)Online publication date: 11-Sep-2015
  • (2015)Reconfigurable Computing ArchitecturesProceedings of the IEEE10.1109/JPROC.2014.2386883103:3(332-354)Online publication date: Mar-2015
  • (2014)System-level memory optimization for high-level synthesis of component-based SoCsProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656098(1-10)Online publication date: 12-Oct-2014
  • (2014)System-level memory optimization for high-level synthesis of component-based SoCsProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2565075.2656098(1-10)Online publication date: 12-Oct-2014
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media