ABSTRACT
Designing extensible instructions is a computationally complex task, due to the large design space each instruction is exposed to. One method of speeding up the design cycle is to characterize instructions and estimate their peculiarities during a design exploration. In this paper, we study and derive three estimation models for extensible instructions: area overhead, latency, and power consumption under a wide range of customization parameters. System decomposition and regression analysis are used as the underlying methods to characterize and analyze extensible instructions. We verify our estimation models using automatically and manually generated extensible instructions, plus extensible instructions used in large real-world applications. The mean absolute error of our estimation models arc as small as: 3.4% (6.7% max.) for area overhead, 5.9% (9.4% max.) for latency, and 4.2% (7.2% max.) for power consumption, compared to estimation through the time consuming synthesis and simulation steps using commercial tools. Our estimation models achieve an average speedup of three orders of magnitude over the commercial tools and thus enable us to conduct a fast and extensive design space exploration that would otherwise not be possible. The estimation models are integrated into our extensible processor tool suite.
- {1} S. Aditya, B. R. Rau, and V. Kathail, "Automatic architectural synthesis of vliw and epic processors," in ISSS, 1999. Google ScholarDigital Library
- {2} A. Peymandoust, L. Pozzi, P. Ienne, and G. Micheli, "Automatic instruction-set extension and utilization for embedded processors," in ASAP, 2003.Google Scholar
- {3} H. Zima and B. Chapman, "Supercompilers for parallel and vector computers," in Addison-Wesley (ACM), 1990. Google Scholar
- {4} Y. Wand and R. Weber, "An ontological model of an information system," in IEEE Tran. of Software Engineering, 1990. Google ScholarDigital Library
- {5} "Splus." Insightful, Inc. (http://www.insightful.com).Google Scholar
- {6} J. Henkel, "Closing the soc design gap," in IEEE Computer Magazine, vol. 36, Iss. 9, pp. 119-121., 2003. Google ScholarDigital Library
- {7} K. Keutzer, S. Malik, and A. R. Newton, "From asic to asip: The next design discontinuity," in ICCD, 2002.Google Scholar
- {8} "Arctangent processor." ARC, Inc. (http://www.arc.com).Google Scholar
- {9} "Asip-meister." (http://www.eda-meister.org/asip-meister/).Google Scholar
- {10} "Jazz dsp." Improv Systems, Inc. (http://www.improvsys.com).Google Scholar
- {11} "Lisatek." CoWare, Inc. (http://www.coware.com).Google Scholar
- {12} "Xtensa processor." Tensilica, Inc. (http://www.tensilica.com).Google Scholar
- {13} H. Choi, J.-S. Kim, C. W. Yoon, et al., "Synthesis of application specific instructions for embedded dsp software," in IEEE Trans. Computers, 1999. Google ScholarDigital Library
- {14} V. Kathail, S. Aditya, R. Schreiber, B. R. Rau, D. C. Cron-quist, and M. Sivaraman, "Pico: Automatically designing custom computers," in Computer, 2002. Google ScholarDigital Library
- {15} J. Lee, K. Choi, and N. Dutt, "Efficient instruction encoding for automatic instruction set design of configurable asips," in ICCAD, 2002. Google ScholarDigital Library
- {16} K. Atasu, L. Pozzi, and P. Lenne, "Automatic application-specific instruction-set extensions under microarchitectural constraints," in DAC, 2003. Google ScholarDigital Library
- {17} F. Sun, A. Raghunathan, S. Ravi, and N. K. Jha, "A scalable application specific processor synthesis methodology," in ICCAD, 2003. Google ScholarDigital Library
- {18} N. Clark, W. Tang, and S. Mahlke, "Automatically generating custom instruction set extensions," in WASP, 2002.Google Scholar
- {19} P. Brisk, A. Kaplan, R. Kastner, and M. Sarrafzadeh, "Instruction generation and regularity extraction for reconfigurable processors," in CASES, 2002. Google ScholarDigital Library
- {20} D. Goodwin and D. Petkov, "Automatically generating custom instruction set extensions," in CASES, 2003.Google Scholar
- {21} J. Sanghavi and A. Wang, "Estimation of speed, area, and power of parameterizable soft ip," in DAC, 2001. Google ScholarDigital Library
- {22} A. Bona, M, Sami, D. Soluto, C. Silvano, V. Zaccaria, and R. Zafalon, "Energy estimation and optimization of embedded vliw processors based on instruction clustering," in DAC, 2002. Google ScholarDigital Library
- {23} Y. Fei, S. Ravi, A. Raghunathan, and N. Jha, "Energy estimation for extensible processors," in DATE, 2003. Google ScholarDigital Library
- {24} N. Cheung, J. Henkel, and S. Parameswaran, "Rapid configuration & instruction selection for an asip: A case study," in DATE, 2003. Google ScholarDigital Library
- {25} P. Jha and N. Dutt, "Rapid estimation for parameterized components in high-level synthesis," in IEEE Tran. on VLSI, 1993.Google Scholar
- {26} "Design compiler." Synopsys, Inc. (http://www.synopsys.com).Google Scholar
- {27} "Powertheater." Sequence, Inc. (http ://www.sequencedesign.com).Google Scholar
- {28} "Modelsim," Model, Inc. (http://www.model.com).Google Scholar
- {29} C. Lee, M. Potkonjak, and W. H. Mangione-Smith, "Mediabench: A tool for evaluating and synthesizing multimedia and communications systems," in Int. Symp. on Microarchitecture, 1997. Google ScholarDigital Library
Recommendations
Energy Estimation for Extensible Processors
DATE '03: Proceedings of the conference on Design, Automation and Test in Europe - Volume 1This paper presents an efficient methodology for estimating the energy consumption of application programs running on extensible processors. Extensible processors, which are increasingly popular in embedded system design, allow a designer to customize a ...
Dynamic configuration of application-specific implicit instructions for embedded pipelined processors
SAC '08: Proceedings of the 2008 ACM symposium on Applied computingIn this paper, we propose the dynamic configuration of application specific implicit instructions for pipelined processors to better exploit the available parallelism at instruction level. Given the target application, the compiler selects a set of ...
Revisiting Using the Results of Pre-Executed Instructions in Runahead Processors
Long-latency cache accesses cause significant performance-impacting delays for both in-order and out-of-order processor systems. To address these delays, runahead pre-execution has been shown to produce speedups by warming-up cache structures during ...
Comments