research-article

A mechanistic performance model for superscalar out-of-order processors

Authors:
Stijn Eyerman

Ghent University, Ghent, Belgium

Ghent University, Ghent, Belgium
View Profile

,
Lieven Eeckhout

Ghent University, Ghent, Belgium

Ghent University, Ghent, Belgium
View Profile

,
Tejas Karkhanis

Advanced Micro Devices, Sunnyvale, CA

Advanced Micro Devices, Sunnyvale, CA
View Profile

,
James E. Smith

University of Wisconsin -- Madison, Madison, WI

University of Wisconsin -- Madison, Madison, WI
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 27 Issue 2Article No.: 3pp 1–37https://doi.org/10.1145/1534909.1534910

Published:29 May 2009Publication History

ACM Transactions on Computer Systems

Abstract

A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions and cache misses. Each type of miss event results in characterizable performance behavior for the execution time interval. By considering an interval's type and length (measured in instructions), execution time can be predicted for the interval. Overall execution time is then determined by aggregating the execution time over all intervals. The mechanistic model provides several advantages over prior modeling approaches, and, when estimating performance, it differs from detailed simulation of a 4-wide out-of-order processor by an average of 7%.

The mechanistic model is applied to the general problem of resource scaling in out-of-order superscalar processors. First, we use the model to determine size relationships among microarchitecture structures in a balanced processor design. Second, we use the mechanistic model to study scaling of both pipeline depth and width in balanced processor designs. We corroborate previous results in this area and provide new results. For example, we show that at optimal design points, the pipeline depth times the square root of the processor width is nearly constant. Finally, we consider the behavior of unbalanced, overprovisioned processor designs based on insight gained from the mechanistic model. We show that in certain situations an overprovisioned processor may lead to improved overall performance. Designs where a processor's dispatch width is wider than its issue width are of particular interest.

References

Agarwal, V., Hrishikesh, M. S., Keckler, S. W., and Burger, D. 2000. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA), 248--259. Google ScholarDigital Library
Berg, E. and Hagersten, E. 2005. Fast data-locality profiling of native execution. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS), 169--180. Google ScholarDigital Library
Brooks, D., Martonosi, M., and Bose, P. 2000. Abstraction via separable components: An empirical study of absolute and relative accuracy in processor performance modeling. Tech. rep. RC 21909, IBM Research Division, T. J. Watson Research Center. December.Google Scholar
Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set. Comput. Architecture News. See also http://www.simplescalar.com for more information. Google ScholarDigital Library
Chou, Y., Fahs, B., and Abraham, S. 2004. Microarchitecture optimizations for exploiting memory-level parallelism. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA), 76--87. Google ScholarDigital Library
Cristal, A., Santana, O. J., Valero, M., and Martinez, J. F. 2004. Toward kilo-instruction processors. ACM Trans. Architecture Code Optimiz. 1, 4, 389--417. Google ScholarDigital Library
Dubey, P. K., Adams III, G. B., and Flynn, M. J. 1994. Instruction window size trade-offs and characterization of program parallelism. IEEE Trans. Comput. 43, 4, 431--442. Google ScholarDigital Library
Dubey, P. K. and Flynn, M. J. 1990. Optimal pipelining. J. Parallel Distrib. Comput. 8, 1, 10--19. Google ScholarDigital Library
Eeckhout, L. and De Bosschere, K. 2001. Hybrid analytical-statistical modeling for efficiently exploring architecture and workload design spaces. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 25--34. Google ScholarDigital Library
Emma, P. G. 1997. Understanding some simple processor-performance limits. IBM J. Res. Development 41, 3, 215--232. Google ScholarDigital Library
Emma, P. G. and Davidson, E. S. 1987. Characterization of branch and data dependencies in programs for evaluating pipeline performance. IEEE Trans. Comput. 36, 7, 859--875. Google ScholarDigital Library
Eyerman, S., Eeckhout, L., Karkhanis, T., and Smith, J. E. 2006a. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 175--184. Google ScholarDigital Library
Eyerman, S., Eeckhout, L., Karkhanis, T., and Smith, J. E. 2007. A top-down approach to architecting CPI component performance counters. IEEE Micro 17, 1, 84--93. Google ScholarDigital Library
Eyerman, S., Smith, J. E., and Eeckhout, L. 2006b. Characterizing the branch misprediction penalty. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 48--58.Google Scholar
Fields, B. A., Bodik, R., Hill, M. D., and Newburn, C. J. 2004. Interaction cost and shotgun profiling. ACM Trans. Architecture Code Optimiz. 1, 3, 272--304. Google ScholarDigital Library
Glew, A. 1998. MLP yes&excl; ILP no&excl; In ASPLOS Wild and Crazy Idea Session.Google Scholar
Guo, F. and Solihin, Y. 2006. An analytical model for cache replacement policy performance. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS), 228--239. Google ScholarDigital Library
Hartstein, A. and Puzak, T. R. 2002. The optimal pipeline depth for a microprocessor. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), 7--13. Google ScholarDigital Library
Hartstein, A. and Puzak, T. R. 2003. Optimum power/performance pipeline depth. In Proceedings of the 36th Annual International Symposium on Microarchitecture (MICRO), 117--126. Google ScholarDigital Library
Hrishikesh, M. S., Jouppi, N. P., Farkas, K. I., Burger, D., Keckler, S. W., and Shivakumar, P. 2002. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), 14--24. Google ScholarDigital Library
Ipek, E., McKee, S. A., de Supinski, B. R., Schulz, M., and Caruana, R. 2006. Efficiently exploring architectural design spaces via predictive modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 195--206. Google ScholarDigital Library
Joseph, P. J., Vaswani, K., and Thazhuthaveetil, M. J. 2006a. Construction and use of linear regression models for processor performance analysis. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA), 99--108.Google Scholar
Joseph, P. J., Vaswani, K., and Thazhuthaveetil, M. J. 2006b. A predictive performance model for superscalar processors. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 161--170. Google ScholarDigital Library
Karkhanis, T. and Smith, J. E. 2002. A day in the life of a data cache miss. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues (WMPI) held in conjunction with ISCA.Google Scholar
Karkhanis, T. and Smith, J. E. 2007. Automated design of application specific superscalar processors: An analytical approach. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA), 402--411. Google ScholarDigital Library
Karkhanis, T. S. and Smith, J. E. 2004. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA), 338--349. Google ScholarDigital Library
Kunkel, S. and Smith, J. E. 1986. Optimal pipelining in supercomputers. In Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA), 404--411. Google ScholarDigital Library
Lee, B. and Brooks, D. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 185--194. Google ScholarDigital Library
Michaud, P., Seznec, A., and Jourdan, S. 1999. Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2--10. Google ScholarDigital Library
Michaud, P., Seznec, A., and Jourdan, S. 2001. An exploration of instruction fetch requirement in out-of-order superscalar processors. Internal J. Parallel Program. 29, 1. Google ScholarCross Ref
Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA), 129--140. Google ScholarDigital Library
Noonburg, D. B. and Shen, J. P. 1997. Theoretical modeling of superscalar processor performance. In Proceedings of the 27th Annual International Symposium on Microarchitecture (MICRO), 52--62. Google ScholarDigital Library
Noonburg, D. B. and Shen, J. P. 1994. A framework for statistical modeling of superscalar processor performance. In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture (HPCA), 298--309. Google ScholarDigital Library
Riseman, E. M. and Foster, C. C. 1972. The inhibition of potential parallelism by conditional jumps. IEEE Trans. Comput. C-21, 12, 1405--1411. Google ScholarDigital Library
Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 45--57. Google ScholarDigital Library
Sorin, D. J., Pai, V. S., Adve, S. V., Vernon, M. K., and Wood, D. A. 1998. Analytic evaluation of shared-memory systems with ILP processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), 380--391. Google ScholarDigital Library
Sprangle, E. and Carmean, D. 2002. Increasing processor performance by implementing deeper pipelines. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), 25--34. Google ScholarDigital Library
Srinivasan, S. T., Rajwar, R., Akkary, H., Gandhi, A., and Upton, M. 2004. Continual flow pipelines. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 107--119. Google ScholarDigital Library
Srinivasan, V., Brooks, D., Gschwind, M., Bose, P., Zyuban, V., Strenski, R. N., and Emma, P. G. 2002. Optimizing pipelines for power and performance. In Proceedings of the 35th Annual International Symposium on Microarchitecture (MICRO), 333--344. Google ScholarDigital Library
Taha, T. M. and Wills, D. S. 2003. An instruction throughput model of superscalar processors. In Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP), 156--163. Google ScholarDigital Library
Taha, T. M. and Wills, D. S. 2008. An instruction throughput model of superscalar processors. IEEE Trans. Comput. 57, 3, 389--403. Google ScholarDigital Library
Wall, D. W. 1991. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), 176--188. Google ScholarDigital Library
Zhong, Y., Dropsho, S. G., and Ding, C. 2003. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT). Google ScholarDigital Library

Index Terms

A mechanistic performance model for superscalar out-of-order processors
1. Computer systems organization
  1. Architectures
    1. Serial architectures
      1. Pipeline computing
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance

Superscalar in-order processors form an interesting alternative to out-of-order processors because of their energy efficiency and lower design complexity. However, despite the reduced design complexity, it is nontrivial to get performance estimates or ...
Read More
A mechanistic performance model for superscalar in-order processors
ISPASS '12: Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software

Mechanistic processor performance modeling builds an analytical model from understanding the underlying mechanisms in the processor and provides fundamental insight in program-microarchitecture interactions, as well as microarchitecture structure ...
Read More
Mechanistic Modeling of Architectural Vulnerability Factor

Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computer Systems Volume 27, Issue 2
May 2009
76 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/1534909
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 May 2009
- Accepted: 1 February 2009
- Received: 1 February 2008
Published in tocs Volume 27, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Superscalar out-of-order processor
analytical modeling
balanced processor design
mechanistic modeling
overprovisioned processor design
performance modeling
pipeline depth
pipeline width
resource scaling
wide front-end dispatch processors
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 151
  Total Citations
  View Citations
- 2,219
  Total Downloads
- Downloads (Last 12 months)163
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance

A mechanistic performance model for superscalar in-order processors

Mechanistic Modeling of Architectural Vulnerability Factor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance

A mechanistic performance model for superscalar in-order processors

Mechanistic Modeling of Architectural Vulnerability Factor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media