skip to main content
10.1145/2744769.2744884acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Optimizing stream program performance on CGRA-based systems

Published: 07 June 2015 Publication History

Abstract

Coarse-Grained Reconfigurable Architectures (CGRAs), often used as coprocessors for DSP and multimedia kernels, can deliver highly energy-efficient execution for compute-intensive kernels. Simultaneously, stream applications, which consist of many actors and channels connecting them, can provide natural representations for DSP applications, and therefore be a good match for CGRAs. We present our results of mapping DSP applications written in StreamIt language to CGRAs, along with our mapping flow. One important challenge in mapping is how to manage the multitude of kernels in the application for the limited local memory of a CGRA, for which we present a novel integer linear programming-based solution. Our evaluation results demonstrate that our software and hardware optimizations can help generate highly efficient mapping of stream applications to CGRAs, enabling far more energy-efficient executions (7× worse to 50× better) compared to using state-of-the-art GP-GPUs.

References

[1]
B. Mei et al. Dresc: a retargetable compiler for coarse-grained reconfigurable architectures. In Proc. FPT, pages 166--173, 2002.
[2]
Hyunchul Park et al. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proc. PACT, pages 166--176, 2008.
[3]
G. Dimitroulakos et al. Resource aware mapping on coarse grained reconfigurable arrays. Microprocessors and Microsystems, 33(2):91--105, 2009.
[4]
Liang Chen and T. Mitra. Graph minor approach for application mapping on cgras. In Proc. FPT, pages 285--292, 2012.
[5]
Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. EPIMap: Using epimorphism to map applications on CGRAs. In Proc. DAC 2012, 2012.
[6]
W. Thies et al. Streamit: A language for streaming applications. In R. Horspool, editor, Compiler Construction, volume 2304 of LNCS. Springer, 2002.
[7]
M. Guthaus et al. MiBench: a free, commercially representative embedded benchmark suite. WWC '01, pages 3--14, Washington, DC, USA, 2001. IEEE.
[8]
T. Austin, E. Larson, and D. Ernst. SimpleScalar: an infrastructure for computer system modeling. Computer, 35, 2002.
[9]
J. Lee et al. Flattening-based mapping of imperfect loop nests for cgras. In Proc. CODES '14, CODES '14, pages 9:1--9:10, New York, NY, USA, 2014. ACM.
[10]
D. Liu et al. Polyhedral model based mapping optimization of loop nests for cgras. In Proc. DAC '13. ACM, 2013.
[11]
Yongjoo Kim et al. Improving performance of nested loops on reconfigurable array processors. ACM Trans. Archit. Code Optim., 8(4):32:1--32:23, 2012.
[12]
A. Hagiescu et al. Automated architecture-aware mapping of streaming applications onto gpus. In Proc. IPDPS 2011, pages 467--478. IEEE, 2011.
[13]
David Wang et al. Dramsim: a memory system simulator. SIGARCH Comput. Archit. News, 33:100--107, November 2005.
[14]
Texas Instruments. OMAP3530/25 Applications Processor.
[15]
Naveen Muralimanohar and Rajeev Balasubramonian. Cacti 6.0: A tool to understand large caches. 2009.
[16]
ARM. ARM L220 Cache Controller Technical Reference Manual.
[17]
B. Bougard et al. A coarse-grained array accelerator for software-defined radio baseband processing. Micro, IEEE, 28(4):41--50, 2008.
[18]
C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis transformation. In Proc. CGO, pages 75--86, 2004.
[19]
Texas Instruments. OMAP3530 Power Estimation Spreadsheet.
[20]
Manjunath Kudlur and Scott Mahlke. Orchestrating the execution of stream programs on multicore platforms. SIGPLAN Not., 43(6):114--124, June 2008.
[21]
A. Hormati et al. Sponge: Portable stream programming on graphics engines. SIGARCH Comput. Archit. News, 39(1):381--392, March 2011.
[22]
A. Hagiescu, Weng-Fai Wong, D. F. Bacon, and R. Rabbah. A computing origami: Folding streams in fpgas. In Proc. DAC, pages 282--287, 2009.
[23]
A. Hormati et al. Optimus: Efficient realization of streaming applications on fpgas. In Proc. CASES, pages 41--50, New York, NY, USA, 2008. ACM.
[24]
N. Kapre and A. DeHon. Vliw-score: Beyond c for sequential control of spice fpga acceleration. In Field-Programmable Technology (FPT), pages 1--9, 2011.
[25]
R. Banakar et al. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proc. CODES 2002. ACM, 2002.
[26]
Steven P. Vanderwiel and David J. Lilja. Data prefetch mechanisms. ACM Comput. Surv., 32(2):174--199, June 2000.
[27]
A. Lifa et al. Dynamic configuration prefetching based on piecewise linear prediction. In Proc. DATE '13. EDA Consortium, 2013.

Cited By

View all
  • (2024)Coarse-Grained Reconfigurable Array (CGRA)Handbook of Computer Architecture10.1007/978-981-97-9314-3_50(465-505)Online publication date: 21-Dec-2024
  • (2023)SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137123(1-6)Online publication date: Apr-2023
  • (2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '15: Proceedings of the 52nd Annual Design Automation Conference
June 2015
1204 pages
ISBN:9781450335201
DOI:10.1145/2744769
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • NRF

Conference

DAC '15
Sponsor:
DAC '15: The 52nd Annual Design Automation Conference 2015
June 7 - 11, 2015
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Coarse-Grained Reconfigurable Array (CGRA)Handbook of Computer Architecture10.1007/978-981-97-9314-3_50(465-505)Online publication date: 21-Dec-2024
  • (2023)SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137123(1-6)Online publication date: Apr-2023
  • (2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
  • (2022)Specializing CGRAs for Light-Weight Convolutional Neural NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312317841:10(3387-3399)Online publication date: Oct-2022
  • (2022)Coarse-Grained Reconfigurable Array (CGRA)Handbook of Computer Architecture10.1007/978-981-15-6401-7_50-1(1-41)Online publication date: 25-Nov-2022
  • (2021)RESHAPE: A Run-Time Dataflow Hardware-Based Mapping for CGRA Overlays2021 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS51556.2021.9401168(1-5)Online publication date: May-2021
  • (2018)URECA: Unified register file for CGRAs2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8342172(1081-1086)Online publication date: Mar-2018
  • (2018)RAMPProceedings of the 55th Annual Design Automation Conference10.1145/3195970.3196101(1-6)Online publication date: 24-Jun-2018
  • (2018)RAMP: Resource-Aware Mapping for CGRAs2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)10.1109/DAC.2018.8465892(1-6)Online publication date: Jun-2018
  • (2018)Lookahead Memory Prefetching for CGRAs Using Partial Loop UnrollingApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-319-78890-6_8(93-104)Online publication date: 8-Apr-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media