ABSTRACT
Analytical modeling is applied to the automated design of application-specific superscalar processors. Using an analytical method bridges the gap between the size of the design space and the time required for detailed cycle-accurate simulations. The proposed design framework takes as inputs the design targets (upper bounds on execution time, area, and energy), design alternatives, and one or more application programs. The output is the set of out-of-order superscalar processors that are Pareto-optimal with respect to performance-energy-area. The core of the new design framework is made up of analytical performance and energy activity models, and an analytical model-based design optimization process.
For a set of benchmark programs and a design space of 2000 designs, the design framework arrives at all performance-energy-area Pareto-optimal design points within 16 minutes on a 2 GHz Pentium-4. In contrast, it is estimated that a naíve cycle-accurate simulation-based exhaustive search would require at least two months to arrive at the Pareto-optimal design points for the same design space.
- IBM, "PowerPC 440 Processor Core," available at http://www-306.ibm.com/.Google Scholar
- T. M. Conte, "Systematic Computer Architecture Proto-typing," PhD Thesis: University of Illinois, 1992. Google ScholarDigital Library
- V. Kathail, S. Aditya, R. Schreiber, B. R. Rau, D. C. Cronquist, and M. Sivaraman, "PICO: Automatically designing custom computers," IEEE Computer, Sept. 2002, pp. 39--47. Google ScholarDigital Library
- B. Kumar and E. S. Davidson, "Computer System Design Using a Hierarchical Approach to Performance Evaluation," Communications of the ACM, vol. 23, 1980, pp. 511--521. Google ScholarDigital Library
- M. A. Bhatti, Practical Optimization Methods with Mathematica Applications: Springer Verlag, 2000. Google ScholarDigital Library
- S. Kirkpatrick, C. Gellat, and M. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220--4598, 1983, pp. 671--680.Google ScholarCross Ref
- E. Perelman, G. Hamerly, and B. Calder, "Picking Statistically Valid and Early Simulation Points," International Conference on Parallel Architectures and Compilation Techniques, 2003, pp. 244--255. Google ScholarDigital Library
- R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe, "SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling," International Symposium on Computer Architecture, 2003, pp. 84--97. Google ScholarDigital Library
- L. Eeckhout, "Accurate Statistical Workload Modeling," PhD Thesis: University of Gent, 2002.Google Scholar
- S. Nussbaum and J. E. Smith, "Modeling Superscalar Processors via Statistical Simulation," International Conference on Parallel Architectures and Compilation Techniques, 2001, pp. 15--24. Google ScholarDigital Library
- M. Oskin, F. T. Chong, and M. Farrens, "HLS: combining statistical and symbolic simulation to guide microprocessor designs," International Symposium on Com--puter Architecture, 2000, pp. 71--82. Google ScholarDigital Library
- P. Michaud, A. Seznec, and S. Jourdan, "An Exploration of Instruction Fetch Requirement in Out-Of-Order Superscalar Processors," International Journal of Parallel Processing, vol. 29--1,2001, pp. 35--38. Google ScholarCross Ref
- D. B. Noonburg and J. P. Shen, "Theoretical Modeling of Superscalar Processor Performance," International Symposium on Microarchitecture, 1994, pp. 52--62. Google ScholarDigital Library
- E. Riseman and C. Foster, "The Inhibition of Potential Parallelism by Conditional Jumps," IEEE Trans. on Computer Architectures, vol. C--21, 1972, pp. 1405--1411.Google Scholar
- T. Taha and D. S. Wills, "An Instruction Throughput Model of Superscalar Processors," International Work-shop on Rapid Systems Prototyping, 2003, pp. 156--163. Google ScholarDigital Library
- M. D. Hill and A. J. Smith, "Evaluating Associativity in CPU Caches," IEEE Transactions on Computers, 1989, pp. 1612--1630. Google ScholarDigital Library
- T. Karkhanis and J. E. Smith, "A First-Order Superscalar Processor Model," International Symposium on Computer Architecture, 2004, pp. 338--349. Google ScholarDigital Library
- "Computer Hardware Understanding Development Tools 2.0 Reference Guide for MacOS X," July 2002.Google Scholar
- J. M. Tendler, et. al., "IBM Power 4: System Microarchitecture," IBM Journal of Research and Development, 2002, pp. 5--26. Google ScholarDigital Library
- S. Kachigan, Statistical Analysis. New York: Radius Press, 1986.Google Scholar
- D. Brooks, V. Tiwari, and M. Martonosi, "Wattch: a framework for architectural-level power analysis and optimizations," International Symposium on Computer Architecture, 2000, pp. 83--94. Google ScholarDigital Library
- J. M. Mulder and M. Flynn, "An Area Model for On-Chip Memories and its Application," IEEE Journal of Solid-State Circuits, vol. 26, 1991, pp. 98--106.Google ScholarCross Ref
- M. J. Flynn, Computer Architecture: Pipelined and Parallel Processor Design: Jones and Bartlett Publishers, 1995. Google ScholarDigital Library
- E. Ipek, et al., "Efficiently Exploiting Architectural Design Spaces via Predictive Modeling," Architectural Support For Programming Languages and Operating Systems, 2006, pp. 195--206. Google ScholarDigital Library
- S. Eyerman, J. Smith, and L. Eeckhout, "Characterizing the Branch Misprediction Penalty", International Symposium on Performance Analysis of Systems and Software, 2006, pp. 48--58.Google ScholarCross Ref
- S. Eyerman, et al., "A Performance Counter Architecture for Computing Accurate CPI Components," Architectural Support For Programming Languages and Operating Systems, 2006, pp. 175--174. Google ScholarDigital Library
Index Terms
- Automated design of application specific superscalar processors: an analytical approach
Recommendations
Complexity-effective superscalar processors
ISCA '97: Proceedings of the 24th annual international symposium on Computer architectureThe performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are ...
Automated design of application specific superscalar processors: an analytical approach
Analytical modeling is applied to the automated design of application-specific superscalar processors. Using an analytical method bridges the gap between the size of the design space and the time required for detailed cycle-accurate simulations. The ...
Comments