skip to main content
10.1145/1152154.1152192acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
Article

Adaptive reorder buffers for SMT processors

Published: 16 September 2006 Publication History

Abstract

In SMT processors, the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential. In this paper, we show that blindly increasing the size of the per-thread reorder buffers to provide a larger number of in-flight instructions does not result in the expected performance gains but, quite in contrast, degrades the instruction throughput for virtually all multithreaded workloads. The reason for this performance loss is the excessive pressure on the shared datapath resources, especially the instruction scheduling logic. We propose intelligent mechanisms for dynamically adapting the number of reorder buffer entries allocated to each thread in an effort to avoid such allocations if they detrimentally impact the scheduler. We achieve this goal through categorizing the program execution into issue-bound and commit-bound phases and only performing the buffer allocations to the threads operating in commit-bound phases. Our adaptive technique achieves improvements of 21% in instruction throughput and 10% in the fairness metric compared to the best performing baseline configuration with static ROBs.

References

[1]
D. Burger, T. Austin. "The SimpleScalar tool set: Version 2.0." Tech. Report, Dept. of CS, Univ. of Wisconsin-Madison, June 1997 and documentation for all Simplescalar releases.
[2]
A. Buyuktosunoglu, et al. "A Circuit-Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors." in Proc of Great Lakes Symposium on VLIS, 2001.
[3]
F. Cazorla, et al. "Dynamically Controlled Resource Allocation in SMT Processors." in Proc Int'l Symp. on Microarchitecture, 2004.
[4]
F. Cazorla, et al. "Improving Memory Latency Aware Fetch Policies for SMT Processors." in Proc International Symposium on High Performance Computing, 2003.
[5]
A. El-Moursy, D.Albonesi. "Front-End Policies for Improved Issue Efficiency in SMT Processors." in Proc. HPCA, 2003.
[6]
J. Henning, "SPEC CPU2000: Measuring CPU Performance in the New Millennium", IEEE Computer, 33(7):28--35, July 2000.
[7]
K. Luo, et al. "Balancing Throughput and Fairness in SMT Processors." in Proc ISPASS, 2001.
[8]
D. Ponomarev, G.Kucuk, K.Ghose, "Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources." in Proc. International Symposium on Microarchitecute (MICRO), 2001.
[9]
S. Raasch, S. Reinhardt, "The Impact of Resource Partitioning on SMT Processors." in Proc. PACT, 2003.
[10]
B. Robatmili et al. "Thread-Sensitive Instruction Issue for SMT Processors." Computer Architecture News, 2004.
[11]
T. Sherwood, et al. "Automatically Characterizing Large Scale Program Behavior." Proc. ASPLOS, 2002.
[12]
D. Tullsen, et al. "Handling Long-Latency Loads in a Simultaneous Multi-threaded Processor." in Proc of International Symposium on Microarchtiecture, 2001.
[13]
D. Tullsen, et al. "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor." in Proc International Symposium on Computer Architecture, 1996.
[14]
D. Tullsen, et al. "Simultaneous Multithreading: Maximizing onchip Parallelism.", Int'l Symp. on Computer Architecture, 1995.
[15]
G. Dorai, et al., "Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance", Int'l Conference on Parallel Architectures and Compilation Techniques, 2002.
[16]
D. Marr, et al, "Hyperthreading Technology Architecture and Microarchitecture", Intel Tech. Journal, vol. 6, No.1, Feb. 2002.
[17]
S. Srinivasan et al, "Continual Flow Pipelines", in Proceedings of ASPLOS, 2004.
[18]
S. Sarangi, et al, "Re-Slice: Selective Re-execution of Long-Retired Misspeculated Instructions Using Forward Slicing", in 38th International Symposium on Microarchitecture, 2005.
[19]
I. Kim, M. Lipasti, "Understanding Scheduling Replay Schemes", Int'l Symp. High Perf. Computer Architecture, 2004.
[20]
J. Stark, et al., "On Pipelining Dynamic Instruction Scheduling Logic", in Proc. of MICRO, 2000.
[21]
S. Palacharla, et al., "Complexity-Effective Superscalar Processors", in Proc. of the Int'l Symp. On Computer Architecture (ISCA), 1997.
[22]
J. Sharkey, "M-Sim: A Flexible, Multi-threaded Simulation Environment." Tech. Report CS-TR-05-DP1, Department of Computer Science, SUNY Binghamton, 2005. http://www.cs.binghamton.edu/~jsharke/m-sim

Cited By

View all
  • (2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
  • (2019)Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00024(15-27)Online publication date: Feb-2019
  • (2014)Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions2014 World Automation Congress (WAC)10.1109/WAC.2014.6936190(893-899)Online publication date: Aug-2014
  • Show More Cited By

Index Terms

  1. Adaptive reorder buffers for SMT processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques
    September 2006
    308 pages
    ISBN:159593264X
    DOI:10.1145/1152154
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 September 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. reorder buffer
    2. simultaneous multithreading

    Qualifiers

    • Article

    Conference

    PACT06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
    • (2019)Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00024(15-27)Online publication date: Feb-2019
    • (2014)Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions2014 World Automation Congress (WAC)10.1109/WAC.2014.6936190(893-899)Online publication date: Aug-2014
    • (2013)Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processorsComputers and Electrical Engineering10.1016/j.compeleceng.2013.05.01339:7(2031-2044)Online publication date: 1-Oct-2013
    • (2012)Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processorsComputers and Electrical Engineering10.1016/j.compeleceng.2012.06.01038:6(1616-1626)Online publication date: 1-Nov-2012
    • (2011)Utilization-Based Resource Partitioning for Power-Performance Efficiency in SMT ProcessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2010.19922:7(1150-1163)Online publication date: 1-Jul-2011
    • (2009)A Dynamic Resource Allocation Optimization for SMT ProcessorsProceedings of the 2009 International Conference on Future Computer and Communication10.1109/ICFCC.2009.47(353-357)Online publication date: 3-Apr-2009
    • (2009)Design of Non-Critical Path Resource Distributor for SMT ProcessorsProceedings of the 2009 International Conference on Computer Engineering and Technology - Volume 0210.1109/ICCET.2009.83(48-52)Online publication date: 22-Jan-2009
    • (2009)Paired ROBsProceedings of the 15th International Euro-Par Conference on Parallel Processing10.1007/978-3-642-03869-3_31(309-320)Online publication date: 23-Aug-2009
    • (2008)A swarm-inspired resource distribution for SMT processorsProceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems10.5555/1512504.1512521(1-7)Online publication date: 25-Nov-2008
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media