ABSTRACT
A proposed performance model for superscalar processorsconsists of 1) a component that models the relationshipbetween instructions issued per cycle and the sizeof the instruction window under ideal conditions, and 2)methods for calculating transient performance penaltiesdue to branch mispredictions, instruction cache misses,and data cache misses.Using trace-derived data dependenceinformation, data and instruction cache miss rates,and branch miss-prediction rates as inputs, the model canarrive at performance estimates for a typical superscalarprocessor that are within 5.8% of detailed simulation onaverage and within 13% in the worst case. The modelalso provides insights into the workings of superscalarprocessors and long-term microarchitecture trends such aspipeline depths and issue widths.
- {1} G. Sohi and S. Vajapeyam, "Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors," International Symposium on Computer Architecture , pp. 27-34, 1987. Google ScholarDigital Library
- {2} P. G. Emma and E. S. Davidson, "Characterization of Branch and Data Dependencies on Programs for Evaluating Pipeline performance," IEEE Transactions on Computers, Vol. 36, pp. 859-875, 1987. Google ScholarDigital Library
- {3} A. Hartstein and T. R. Puzak, "The Optimum Pipeline Depth for a Microprocessors," International Symposium on Computer Architecture, pp. 7-13, 2002. Google ScholarDigital Library
- {4} E. Sprangle and D. Carmean, "Increasing Processor Performance by Implementing Deeper Pipelines," International Symposium on Computer Architecture , pp. 25-34, 2002. Google ScholarDigital Library
- {5} D. B. Noonburg and J. P. Shen, "Theoretical Modeling of Superscalar Processor Performance," International Symposium on Microarchitecture, pp. 52-62, 1994. Google ScholarDigital Library
- {6} P. Michaud, A. Seznec, and S. Jourdan, "Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors," International Symposium on Parallel Architectures and Compilation Techniques, 1999. Google ScholarDigital Library
- {7} P. Michaud, A. Seznec, and S. Jourdan, "An Exploration of Instruction Fetch Requirement in Out-Of-Order Superscalar Processors," International Journal of Parallel Programming, vol. 29, 2001. Google ScholarCross Ref
- {8} S. Nussbaum and J. E. Smith, "Modeling Superscalar Processors via Statistical Simulation," International Symposium on Parallel Architectures and Compilation Techniques, 2001. Google ScholarDigital Library
- {9} R. Carl and J. E. Smith, "Modeling Superscalar Processors via Statistical Simulation," Workshop on Performance Analysis and Its Impact on Design, 1998.Google Scholar
- {10} L. Eeckhout, K. De Bosschere, and H. Neefs, "Performance Analysis Through Synthetic Trace Generation," International Symposium on Performance Analysis of Systems and Software, 2000. Google ScholarDigital Library
- {11} D. B. Noonburg and J. P. Shen, "A Framework for Statistical Modeling of Superscalar Processor Performances," International Symposium on High Performance Computer Architecture, pp. 298-309, 1997. Google ScholarDigital Library
- {12} D. Sorin, V. Pai, S. V. Adve, M. K. Vernon, and D. A. Wood, "Analytic Evaluation of Shared Memory Systems with ILP Processors," International Symposium on Computer Architecture, pp. 380-391, 1998. Google ScholarDigital Library
- {13} B. A. Fields, R. Bodik, M. D. Hill, and C. J. Newburn, "Using Interaction Costs for Microarchitectural Bottleneck Analysis," International Symposium on Microarchitecture, pp. 228-239, 2003. Google ScholarDigital Library
- {14} D. J. Ofelt, "Efficient Performance Prediction for Modern Microprocessors," Stanford University PhD Thesis, 1999. Google ScholarDigital Library
- {15} E. Riseman and C. Foster, "The Inhibition of Potential Parallelism by Conditional Jumps," IEEE Transactions on Computers, vol. C-21, pp. 1405-1411, 1972.Google Scholar
- {16} N. P. Jouppi, "The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance," IEEE Transactions on Computers , vol. 38, pp. 1645-1658, 1989. Google ScholarDigital Library
- {17} S. R. Kunkel and J. E. Smith, "Optimal pipelining in supercomputers," International Symposium on Computer Architecture, pp. 404-411, 1986. Google ScholarDigital Library
- {18} M. S. Hrishikesh, D. Burger, N. P. Jouppi, S. W. Keckler, K. I. Farkas, and P. Shivakumar, "The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays," International Symposium on Computer Architecture, pp. 14-24, 2002. Google ScholarDigital Library
Recommendations
A First-Order Superscalar Processor Model
ISCA 2004A proposed performance model for superscalar processorsconsists of 1) a component that models the relationshipbetween instructions issued per cycle and the sizeof the instruction window under ideal conditions, and 2)methods for calculating transient ...
An out-of-order superscalar processor on FPGA: the ReOrder buffer design
DATE '12: Proceedings of the Conference on Design, Automation and Test in EuropeEmbedded systems based on FPGA (Field-Programmable Gate Arrays) must exhibit more performance for new applications. However, no high-performance superscalar soft processor is available on the FPGA, because the superscalar architecture is not suitable ...
Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing Architecture (MLCA), a novel architecture targeted for ...
Comments