skip to main content
10.1145/2656106.2656125acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

Retargetable automatic generation of compound instructions for CGRA based reconfigurable processor applications

Published: 12 October 2014 Publication History

Abstract

Reconfigurable processors such as SRP (Samsung Reconfigurable Processors) have become increasingly important, which enables just enough flexibility of accepting software solutions and providing application specific hardware configurability for faster time-to-market, lower development cost and higher performance while maintaining lower energy consumption and area. The reconfigurable processor compilation framework supports wide range of architectures through architecture description template for different domains of applications such as image processing, multimedia, video, and graphics. These architectures support several domain specific compound instructions (also called as intrinsics), which are computationally efficient when compared to the set of general instructions in the processor. Application developers have to use these intrinsics in their programs according to the architecture, which can result very inefficient usage, tedious and more error-prone. Moreover, the intrinsics provided by the architecture need constant reference to the intrinsics file during development. In this paper, we propose a retargetable novel methodology for the automatic generation of compound instructions for a given architecture and application source code at compile time. Our approach is able to consider ~75% of total intrinsics in the architectures with the success rate of > 90% in identifying the intrinsics in the benchmarks such as AVC, OpenGL Full Engine and OpenGL Vector benchmarks.

References

[1]
Gnu gcc: http://gcc.gnu.org.
[2]
Opengl: http://www.opengl.org.
[3]
Joint video team of itu-t and iso/iec jtc 1, draft itu-t recommendation and final draft international standard of joint video specification (itu-t rec. h.264 --- iso/iec 14496-10 avc), document jvt-g050r1, 2003.
[4]
TIE-the fast path to high-performance embedded soc processing, 2009. Tensilica Tech Report, http://www.tensilica.com/hwlit
[5]
M. Bose and V. Rajagopala. Physics engine on reconfigurable processor - low power optimized solution empowering next-generation graphics on embedded platforms. In CGAMES, pages 138--142, 2012.
[6]
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, W. Yoder, and t. T. Team. Scaling to the end of silicon with edge architectures. Computer, 37(7):44--55, July 2004.
[7]
J. Choi, S. Kim, and H. Han. Accelerating loops for coarse grained reconfigurable architectures using instruction extensions. In Proceedings of the 2011 ACM Symposium on Research in Applied Computation, RACS '11, pages 314--318, New York, NY, USA, 2011. ACM.
[8]
K. E. Coons, X. Chen, D. Burger, K. S. McKinley, and S. K. Kushwaha. A spatial path scheduling algorithm for edge architectures. SIGARCH Comput. Archit. News, 34(5):129--140, Oct. 2006.
[9]
S. Friedman, A. Carroll, B. Van Essen, B. Ylvisaker, C. Ebeling, and S. Hauck. Spr: an architecture-adaptive cgra mapping tool. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '09, pages 191--200, New York.
[10]
D. Goodwin and D. Petkov. Automatic generation of application specific processors. In Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, CASES '03, pages 137--147, New York.
[11]
Y. Huang, P. Ienne, O. Temam, Y. Chen, and C. Wu. Elastic cgras. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '13, pages 171--180, New York.
[12]
H. P. Huynh, Y. Liang, and T. Mitra. Efficient custom instructions generation for system-level design. In Field-Programmable Technology (FPT), 2010 International Conference on, pages 445--448, 2010.
[13]
C. Jang, J. Kim, J. Lee, H.-S. Kim, D.-H. Yoo, S. Kim, H.-S. Kim, and S. Ryu. An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures. In Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems, LCTES '11, pages 151--160, New York.
[14]
J. Larrosa and G. Valiente. Constraint satisfaction algorithms for graph pattern matching. Mathematical. Structures in Comp. Sci., 12(4):403--422, Aug. 2002.
[15]
C. Lattner and V. Adve. The llvm compiler framework and infrastructure tutorial. In Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, LCPC'04, pages 15--16, Berlin, Heidelberg, 2005. Springer-Verlag.
[16]
W. J. Lee, S.-O. Woo, K.-T. Kwon, S.-J. Son, K.-J. Min, S.-Y. Jung, C.-M. Park, and S.-H. Lee. A scalable gpu architecture based on dynamically reconfigurable embedded processor. In High Performance Graphics, Aug. 2011.
[17]
B. Mei, A. Lambrechts, J.-Y. Mignolet, D. Verkest, and R. Lauwereins. Architecture exploration for a reconfigurable architecture template. Design Test of Computers, IEEE, 22(2):90--101, march-april 2005.
[18]
R. Nagarajan, S. K. Kushwaha, D. Burger, K. S. McKinley, C. Lin, and S. W. Keckler. Static placement, dynamic issue (spdi) scheduling for edge architectures. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pages 74--84, Washington, DC, USA, 2004.
[19]
Y. Park, H. Park, and S. A. Mahlke. Cgra express: accelerating execution using dynamic operation fusion. In CASES, pages 271--280, 2009.
[20]
Y. Park, J. J. K. Park, and S. A. Mahlke. Efficient performance scaling of future cgras for mobile applications. In FPT, pages 335--342, 2012.
[21]
J. L. Peterson. Petri nets. ACM Comput. Surv., 9(3):223--252, Sept. 1977. ISSN 0360-0300.
[22]
P. Raghavan, A. Lambrechts, J. Absar, M. Jayapala, F. Catthoor, and D. Verkest. Coffee: compiler framework for energy-aware exploration. In Proceedings of the 3rd international conference on High performance embedded architectures and compilers, HiPEAC'08, pages 193--208, Berlin, Heidelberg, 2008. Springer-Verlag.
[23]
B. R. Rau. Iterative modulo scheduling: an algorithm for software pipelining loops. In Proceedings of the 27th annual international symposium on Microarchitecture, MICRO 27, pages 63--74, New York, NY, USA, 1994. ACM.
[24]
A. Smith, J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, K. S. McKinle, and J. Burrill. Compiling for edge architectures. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '06, pages 185--195, Washington, DC, USA, 2006. IEEE Computer Society.
[25]
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23 (1):31--42, Jan. 1976.
[26]
N. S. Voros, M. Hübner, J. Becker, M. Kühnle, F. Thomaitiv, A. Grasset, P. Brelet, P. Bonnot, F. Campi, E. Schüler, H. Sahlbach, S. Whitty, R. Ernst, E. Billich, C. Tischendorf, U. Heinkel, F. Ieromnimon, D. Kritharidis, A. Schneider, J. Knaeblein, and W. Putzke-Röming. Morpheus: A heterogeneous dynamically reconfigurable platform for designing highly complex embedded systems. ACM Trans. Embed. Comput. Syst., 12(3):70:1--70:33, Apr. 2013.
[27]
C. Wolinski and K. Kuchcinski. Identification of application specific instructions based on sub-graph isomorphism constraints. In Application -specific Systems, Architectures and Processors, 2007. ASAP. IEEE International Conf. on, pages 328--333, 2007.

Cited By

View all
  • (2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '14: Proceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems
October 2014
241 pages
ISBN:9781450330503
DOI:10.1145/2656106
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2014

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ESWEEK'14
ESWEEK'14: TENTH EMBEDDED SYSTEM WEEK
October 12 - 17, 2014
New Delhi, India

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Generation of Customized Accelerators for Loop Pipelining of Binary Instruction TracesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.257364025:1(21-34)Online publication date: 1-Jan-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media