Article

Adaptive reorder buffers for SMT processors

Authors:

Joseph Sharkey,

Dmitry PonomarevAuthors Info & Claims

PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques

Pages 244 - 253

https://doi.org/10.1145/1152154.1152192

Published: 16 September 2006 Publication History

Abstract

In SMT processors, the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential. In this paper, we show that blindly increasing the size of the per-thread reorder buffers to provide a larger number of in-flight instructions does not result in the expected performance gains but, quite in contrast, degrades the instruction throughput for virtually all multithreaded workloads. The reason for this performance loss is the excessive pressure on the shared datapath resources, especially the instruction scheduling logic. We propose intelligent mechanisms for dynamically adapting the number of reorder buffer entries allocated to each thread in an effort to avoid such allocations if they detrimentally impact the scheduler. We achieve this goal through categorizing the program execution into issue-bound and commit-bound phases and only performing the buffer allocations to the threads operating in commit-bound phases. Our adaptive technique achieves improvements of 21% in instruction throughput and 10% in the fairness metric compared to the best performing baseline configuration with static ROBs.

References

[1]

D. Burger, T. Austin. "The SimpleScalar tool set: Version 2.0." Tech. Report, Dept. of CS, Univ. of Wisconsin-Madison, June 1997 and documentation for all Simplescalar releases.

[2]

A. Buyuktosunoglu, et al. "A Circuit-Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors." in Proc of Great Lakes Symposium on VLIS, 2001.

Digital Library

[3]

F. Cazorla, et al. "Dynamically Controlled Resource Allocation in SMT Processors." in Proc Int'l Symp. on Microarchitecture, 2004.

Digital Library

[4]

F. Cazorla, et al. "Improving Memory Latency Aware Fetch Policies for SMT Processors." in Proc International Symposium on High Performance Computing, 2003.

[5]

A. El-Moursy, D.Albonesi. "Front-End Policies for Improved Issue Efficiency in SMT Processors." in Proc. HPCA, 2003.

Digital Library

[6]

J. Henning, "SPEC CPU2000: Measuring CPU Performance in the New Millennium", IEEE Computer, 33(7):28--35, July 2000.

Digital Library

[7]

K. Luo, et al. "Balancing Throughput and Fairness in SMT Processors." in Proc ISPASS, 2001.

[8]

D. Ponomarev, G.Kucuk, K.Ghose, "Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources." in Proc. International Symposium on Microarchitecute (MICRO), 2001.

Digital Library

[9]

S. Raasch, S. Reinhardt, "The Impact of Resource Partitioning on SMT Processors." in Proc. PACT, 2003.

Digital Library

[10]

B. Robatmili et al. "Thread-Sensitive Instruction Issue for SMT Processors." Computer Architecture News, 2004.

Digital Library

[11]

T. Sherwood, et al. "Automatically Characterizing Large Scale Program Behavior." Proc. ASPLOS, 2002.

Digital Library

[12]

D. Tullsen, et al. "Handling Long-Latency Loads in a Simultaneous Multi-threaded Processor." in Proc of International Symposium on Microarchtiecture, 2001.

Digital Library

[13]

D. Tullsen, et al. "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor." in Proc International Symposium on Computer Architecture, 1996.

Digital Library

[14]

D. Tullsen, et al. "Simultaneous Multithreading: Maximizing onchip Parallelism.", Int'l Symp. on Computer Architecture, 1995.

Digital Library

[15]

G. Dorai, et al., "Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance", Int'l Conference on Parallel Architectures and Compilation Techniques, 2002.

Digital Library

[16]

D. Marr, et al, "Hyperthreading Technology Architecture and Microarchitecture", Intel Tech. Journal, vol. 6, No.1, Feb. 2002.

[17]

S. Srinivasan et al, "Continual Flow Pipelines", in Proceedings of ASPLOS, 2004.

Digital Library

[18]

S. Sarangi, et al, "Re-Slice: Selective Re-execution of Long-Retired Misspeculated Instructions Using Forward Slicing", in 38th International Symposium on Microarchitecture, 2005.

Digital Library

[19]

I. Kim, M. Lipasti, "Understanding Scheduling Replay Schemes", Int'l Symp. High Perf. Computer Architecture, 2004.

Digital Library

[20]

J. Stark, et al., "On Pipelining Dynamic Instruction Scheduling Logic", in Proc. of MICRO, 2000.

Digital Library

[21]

S. Palacharla, et al., "Complexity-Effective Superscalar Processors", in Proc. of the Int'l Symp. On Computer Architecture (ISCA), 1997.

Digital Library

[22]

J. Sharkey, "M-Sim: A Flexible, Multi-threaded Simulation Environment." Tech. Report CS-TR-05-DP1, Department of Computer Science, SUNY Binghamton, 2005. http://www.cs.binghamton.edu/~jsharke/m-sim

Cited By

Jin XZhou YHuang BYu ZZhan XWang HWang SYu NSun NBao YEigenmann RDing CMcKee S(2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330364
Margaritov AGupta SGonzalez-Alberquilla RGrot B(2019)Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00024(15-27)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00024
Sahba ASahba RWei-Ming Lin (2014)Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions2014 World Automation Congress (WAC)10.1109/WAC.2014.6936190(893-899)Online publication date: Aug-2014
https://doi.org/10.1109/WAC.2014.6936190
Show More Cited By

Index Terms

Adaptive reorder buffers for SMT processors
1. Computer systems organization
  1. Architectures

Recommendations

The impact of speculative execution on SMT processors

By executing two or more threads concurrently, Simultaneous MultiThreading (SMT) architectures are able to exploit both Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP) from the increased number of in-flight instructions that are ...
Complexity-Effective Reorder Buffer Designs for Superscalar Processors

Abstract--All contemporary dynamically scheduled processors support register renaming to cope with false data dependencies. One of the ways to implement register renaming is to use the slots within the Reorder Buffer (ROB) as physical registers. In such ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques

September 2006

308 pages

ISBN:159593264X

DOI:10.1145/1152154

General Chair:
Erik Altman
IBM Research, USA
,
Program Chairs:
Kevin Skadron
University of Virginia, USA
,
Ben Zorn
Microsoft Research, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PACT06

Sponsor:

ACM

PACT06: 2006 International Conference on Parallel Architectures and Compilation Techniques

September 16 - 20, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
392
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jin XZhou YHuang BYu ZZhan XWang HWang SYu NSun NBao YEigenmann RDing CMcKee S(2019)QoSMTProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330364(206-216)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330364
Margaritov AGupta SGonzalez-Alberquilla RGrot B(2019)Stretch: Balancing QoS and Throughput for Colocated Server Workloads on SMT Cores2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00024(15-27)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00024
Sahba ASahba RWei-Ming Lin (2014)Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions2014 World Automation Congress (WAC)10.1109/WAC.2014.6936190(893-899)Online publication date: Aug-2014
https://doi.org/10.1109/WAC.2014.6936190
Zhang YDouglas CLin W(2013)Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processorsComputers and Electrical Engineering10.1016/j.compeleceng.2013.05.01339:7(2031-2044)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1016/j.compeleceng.2013.05.013
Debnath MLin WJohn E(2012)Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processorsComputers and Electrical Engineering10.1016/j.compeleceng.2012.06.01038:6(1616-1626)Online publication date: 1-Nov-2012
https://dl.acm.org/doi/10.1016/j.compeleceng.2012.06.010
Wang HKoren IKrishna C(2011)Utilization-Based Resource Partitioning for Power-Performance Efficiency in SMT ProcessorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2010.19922:7(1150-1163)Online publication date: 1-Jul-2011
https://dl.acm.org/doi/10.1109/TPDS.2010.199
Chen HPing LLu KJiang X(2009)A Dynamic Resource Allocation Optimization for SMT ProcessorsProceedings of the 2009 International Conference on Future Computer and Communication10.1109/ICFCC.2009.47(353-357)Online publication date: 3-Apr-2009
https://dl.acm.org/doi/10.1109/ICFCC.2009.47
Chen HPing LChen XLu K(2009)Design of Non-Critical Path Resource Distributor for SMT ProcessorsProceedings of the 2009 International Conference on Computer Engineering and Technology - Volume 0210.1109/ICCET.2009.83(48-52)Online publication date: 22-Jan-2009
https://dl.acm.org/doi/10.1109/ICCET.2009.83
Ubal RSahuquillo JPetit SLópez P(2009)Paired ROBsProceedings of the 15th International Euro-Par Conference on Parallel Processing10.1007/978-3-642-03869-3_31(309-320)Online publication date: 23-Aug-2009
https://dl.acm.org/doi/10.1007/978-3-642-03869-3_31
Chen HPing LPan XLu KJiang XMurata MAkan O(2008)A swarm-inspired resource distribution for SMT processorsProceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems10.5555/1512504.1512521(1-7)Online publication date: 25-Nov-2008
https://dl.acm.org/doi/10.5555/1512504.1512521
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten