ABSTRACT
Industry's demand for flexible embedded solutions providing high performance and short time-to-market has led to the development of configurable and extensible processors. These pre-verified application-specific processors build on proven baseline cores while allowing for some degree of customization through user-defined instruction set extensions (ISE) implemented as functional units in an extended micro-architecture. The traditional design flow for ISE is based on plain C sources of the target application and, after some ISE identification and synthesis stages, a modified source file is produced with explicit handles to the new machine instructions. Further code optimization is left to the compiler. In this paper we develop a novel approach, namely the combined exploration of source-level transformations and ISE identification. We have combined automated code transformation and ISE generators to explore the potential benefits of such a combination. This applies up to 50 transformations from a selection of 70, and synthesizes ISEs for the resulting code. The resulting performance has been measured on 26 applications from the SNU-RT and UTDSP benchmarks. We show that the instruction extensions generated by automated tools are heavily influenced by source code structure. Our results demonstrate that a combination of source-level transformations and instruction set extensions can yield average performance improvements of 47%. This out performs instruction set extensions when applied in isolation, and in extreme cases yields a speedup of 2.85.
- ARC International. ARChitect product brief, 2007.Google Scholar
- R. Leupers, K. Karuri, S. Kraemer, and M. Pandey. A design flow for configurable embedded processors based on optimized instruction set extension synthesis. In Proceedings of Design Automation & Test in Europe (DATE), Munich, Germany, 2006. Google ScholarDigital Library
- Armita Peymandoust, Laura Pozzi, Paolo Ienne, and Giovanni De Micheli. Automatic instruction set extension and utilisation for embedded processors. In Proceedings of the 14th International Conference on Application-specific Systems, Architectures and Processors, The Hague, The Netherlands., 2003.Google Scholar
- Partha Biswas, Sundarshan Banerjee, Nikil D. Dutt, Laura Pozzi, and Paolo Ienne. ISEGEN: An iterative improvement-based ISE generation technique for fast customization of processors. IEEE Transactions on VLSI, 14(7), 2006. Google ScholarDigital Library
- Kubilay Atasu, Gunhan Dundar, and Can Ozturan. An integer linear programming approach for identifying instruction-set extensions. In Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis (CODES+ISSS'05), 2005. Google ScholarDigital Library
- Laura Pozzi, Kubilay Atasu, and Paolo Ienne. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(7):1209--1229, 2006. Google ScholarDigital Library
- Ajay K. Verma and Paolo Ienne. Towards the automatic exploration of arithmetic circuit architectures. In In Proceedings of the 43rd Design Automation Conference, San Francisco, California, 2006. Google ScholarDigital Library
- Laura Pozzi and Paolo Ienne. Exploiting pipelining to relax register-file port constraints of instruction-set extensions. In In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, San Francisco, Calif, pages 2--10, 2005. Google ScholarDigital Library
- M. Hohenauer, H. Scharwaechter, K. Karuri, O. Wahlen, T. Kogel, R. Leupers, G. Ascheid, and H. Meyr. Compiler-in-loop architecture exploration for efficient application specific embedded processor design. In Design & Elektronik, Munich, Germany, WEKA Verlag, 2004.Google Scholar
- T. Glokler, A. Hoffmann, and H. Meyr. Methodical low-power ASIP design space exploration. VLSI Signal Processing, 33, 2003. Google ScholarDigital Library
- ACE CoSy Website - http://www.ace.nl/compiler/cosy.html.Google Scholar
- CoWare LISATek Datasheet -http://www.coware.com/PDF/products/LISATek.pdf.Google Scholar
- Paolo Ienne and Ajay K. Verma. Arithmetic transformations to maximise the use of compressor trees. In Proceedings of the IEEE International Workshop on Electronic Design, Test and Applications, Perth, Australia, 2004. Google ScholarDigital Library
- Paolo Bonzini and Laura Pozzi. Code transformation strategies for extensible embedded processors. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 242--252, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- Yijian Wang and David Kaeli. Source level transformations to improve I/O data partitioning. In Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os, 2003. Google ScholarDigital Library
- E. Chung, L. Benini, and G. De Micheli. Energy efficient source code transformation based on value profiling. In Proceedings of the International Workshop on Compilers and Operating Systems for Low Power, Philadelphia, USA, 2000. Google ScholarDigital Library
- C. Kulkarni, F. Catthoor, and H. De Man. Code transformations for low power caching in embedded multimedia processors. In Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium, pages 292--297, 1998. Google ScholarDigital Library
- B.D. Winters and A.J. Hu. Source-level transformations for improved formal verification. In Proceedings of the IEEE International Conference on Computer Design, 2000. Google ScholarDigital Library
- Heiko Falk and Peter Marwedel. Source Code Optimization Techniques for Data Flow Dominated Embedded Software. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2004.Google Scholar
- Björn Franke and Michael O'Boyle. Array recovery and high-level transformations for DSP applications. ACM Transactions on Embedded Computing Systems (TECS), 2(2):132--162, May 2003. Google ScholarDigital Library
- Victor De La Luz and Mahmut Kandemir. Array regrouping and its use in compiling data-intensive embedded applications. IEEE Transactions on Computers, 53(1):1--19, 2004. Google ScholarDigital Library
- Björn Franke and Michael O'Boyle. Combining program recovery, auto-parallelisation and locality analysis for C programs on multi-processor embedded systems. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT'03), New Orleans, September/October 2003. Google ScholarDigital Library
- Markus Schordan and Daniel J. Quinlan. A source-to-source architecture for user-defined optimizations. In Proceedings of the Joint Modular Languages Conference, 2003.Google ScholarCross Ref
- Alexandre Borghi, Valentin David, and Akim Demaille. C-Transformers - a framework to write C program transformations. ACM Crossroads, 2004. Google ScholarDigital Library
- Björn Franke, Michael O'Boyle, John Thomson, and Grigori Fursin. Probabilistic source-level optimisation of embedded programs. In Proceedings of the 2005 Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'05), 2005. Google ScholarDigital Library
- Felix Agakov, Edwin Bonilla, John Cavazos, Björn Franke, Michael F.P. O'Boyle, John Thomson, Marc Toussaint, and Christopher K.I. Williams. Using machine learning to focus iterative optimization. In Proceedings of the 4th Annual International Symposium on Code Generation and Optimization (CGO), 2006. Google ScholarDigital Library
- Jerzy Rozenblit and Klaus Buchenrieder. Codesign - Computer-Aided Software/Hardware Engineering. IEEE Press, New York, 1995. Google ScholarDigital Library
- Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steve W. K. Tjiang, Shih-Weui Liao, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices, 29(12), 1994. Google ScholarDigital Library
- SNU-RT Real-Time Benchmarks -http://archi.snu.ac.kr/realtime/benchmark/.Google Scholar
- Corinna G. Lee. UTDSP Benchmarks -http://www.eecg.toronto.edu/ corinna/DSP/infrastructure/UTDSP.html, 1998.Google Scholar
- Tensilica Inc. The XPRES compiler: Triple-threat solution to code performance challenges. Tensilica Inc Whitepaper, 2005.Google Scholar
Index Terms
Combining source-to-source transformations and processor instruction set extensions for the automated design-space exploration of embedded systems
Recommendations
Code transformation and instruction set extension
The demand for flexible embedded solutions and short time-to-market has led to the development of extensible processors that allow for customization through user-defined instruction set extensions (ISEs). These are usually identified from plain C ...
Combining source-to-source transformations and processor instruction set extensions for the automated design-space exploration of embedded systems
Proceedings of the 2007 LCTES conferenceIndustry's demand for flexible embedded solutions providing high performance and short time-to-market has led to the development of configurable and extensible processors. These pre-verified application-specific processors build on proven baseline cores ...
Automatic custom instruction identification for application-specific instruction set processors
The application-specific instruction set processors (ASIPs) have received more and more attention in recent years. ASIPs make trade-offs between flexibility and performance by extending the base instruction set of a general-purpose processor with custom ...
Comments