research-article

Improving single-thread performance with fine-grain state maintenance

Authors:

Soner ÕnderAuthors Info & Claims

CF '08: Proceedings of the 5th conference on Computing frontiers

Pages 251 - 260

https://doi.org/10.1145/1366230.1366274

Published: 05 May 2008 Publication History

Abstract

We show that a multi-threaded processor that is aware of the processor state in a fine-grain manner can improve single-thread performance significantly by assigning the task of maintaining the correct processor state to an independent thread. We develop fine-grain state maintenance techniques that can be applied in multi-threaded environments and present a fine-grain state application of runahead execution where the data values dependent on a missed load are treated as damaged values. These values are verified and recovered as necessary by an independent thread. We evaluate an SMT-like fine grain state processor and show that it obtains an average of 38.9% and up to 160.0% better performance than coarse-grain baseline processors on the SPEC CFP2000 benchmark suite.

References

[1]

Chen-Yong Cher and T. N. Vijaykumar.Skipper: a microarchitecture for exploiting control-flow independence.In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 4--15, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[2]

Yuan C. Chou, Jason Fung, and John Paul Shen.Reducing branch misprediction penalties via dynamic control independence detection.In Proceedings of the 13th ACM International Conference on Supercomputing, pages 109--118, 1999.

Digital Library

[3]

George Z. Chrysos and Joel S. Emer.Memory dependence prediction using store sets.In Proceedings of the 25th International Conference on Computer Architecture, pages 142--153, June 1998.

Digital Library

[4]

James Dundas and Trevor Mudge.Improving data cache performance by pre-executing instructions under a cache miss.In Proceedings of the 1997 ACM International Conference on Supercomputing, pages 68--75, Vienna, Austria, July 1997.

Digital Library

[5]

Amit Gandhi, Haitham Akkary, and Srikanth T. Srinivasan.Reducing branch misprediction penalty via selective branch recovery.Proceedings of the 10th International Symposium on High-Performance Computer Architecture, pages 254--264, February 2004.

Digital Library

[6]

G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel.The microarchitecture of the pentium 4 processor.In Intel Technology Journal, February 2001

[7]

W. W. Hwu and Y. N. Patt.Checkpoint repair for out-of-order execution machines.In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 18--26, June 1987.

Digital Library

[8]

Mike Johnson.Superscalar Microprocessor Design.Prentice Hall, 1991.

[9]

Tejas Karkhanis and J. E. Smith. A day in the life of a data cache miss.In Workshop on Memory Performance Issues, Anchorage, AK, May 2002.

[10]

AJ KleinOsowski and David J. Lilja.Minnespec: A new spec benchmark workload for simulation-based computer architecture research.Computer Architecture Letters, Volume 1, June 2002.

Digital Library

[11]

Onur Mutlu, Hyesoon Kim, and Yale N. Patt.On reusing the results of pre-executed instructions in a runahead execution processor. In IEEE Computer Architecture Letters (CAL), volume 4, Washington, DC, USA, January 2005. IEEE Computer Society.

Digital Library

[12]

Onur Mutlu, Hyesoon Kim, and Yale N. Patt.Techniques for efficient processing in runahead execution engines.In Proceedings of the 32st International Symposium on Computer Architecture, pages 370--381, Madison, WI, June 2005.

Digital Library

[13]

Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt.Runahead execution: An effective alternative to large instruction windows.IEEE Micro, 23(6):20--25, 2003.

Digital Library

[14]

Soner Õnder and Rajiv Gupta.Automatic generation of microarchitecture simulators.In IEEE International Conference on Computer Languages, pages 80--89, Chicago, May 1998.

Digital Library

[15]

Soner Õnder and Rajiv Gupta.Dynamic memory disambiguation in the presence of out-of-order store issuing.In 32nd Annual IEEE-ACM International Symposium on Microarchitecture, pages 170 -- 176, November 1999.

Digital Library

[16]

Zach Purser, Karthik Sundaramoorthy, and Eirc Rotenberg.A study of slipstream processors.In Proceedings of the 33th Annual IEEE/ACM International Symposium on Microarchitecture, pages 269--280, Monterey, CA, December 2000.

Digital Library

[17]

Amir Roth and Gurindar S. Sohi. Register integration: a simple and efficient implementation of squash reuse.In Proceedings of the 33th Annual IEEE/ACM International Symposium on Microarchitecture, pages 223--234, Monterey, CA, December 2000.

Digital Library

[18]

Smruti R. Sarangi, Wei Liu, Josep Torrellas, and Yuanyuan Zhou.Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing.In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), Barcelona, Spain, November 2005. IEEE Computer Society.

Digital Library

[19]

James E. Smith and Andrew R. Pleszkun.Implementing precise interrupts in pipelined processors.IEEE Trans. Computers, 37(5):562--573, 1988.

Digital Library

[20]

Avinash Sodani and Gurindar S. Sohi.Dynamic instruction reuse. In Proceedings of the 24th International Conference on Computer Architecture, 1997.

Digital Library

[21]

Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, Amit Gandhi, and Mike Upton.Continual flow pipelines.In ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, pages 107--119, New York, NY, USA, 2004. ACM Press.

Digital Library

[22]

Karthik Sundaramoorthy, Zach Purser, and Eirc Rotenberg.Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, 2000.

Digital Library

[23]

Theo Ungerer, Borut Robic, and Jurij Silc.A survey of processors with explicit multithreading.In ACM Computing Surveys, volume 35, pages 29--63. ACM, March 2003.

Digital Library

[24]

Huiyang Zhou.Dual-core execution: Building a highly scalable single-thread instruction window.In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), pages 231--242, Washington, DC, 2005. IEEE Computer Society.

Digital Library

[25]

Peng Zhou, S. Õnder, and Steve Carr.Fast branch misprediction recovery in out-of-order superscalar processors.In Proceedings of the 2005 ACM International Conference on Supercomputing, pages 41--50, Boston, MA, June 2005

Digital Library

Cited By

Jothin RVasanthanayaki C(2018)High Performance Static Segment On-Chip Memory for Image Processing ApplicationsJournal of Electronic Testing: Theory and Applications10.1007/s10836-018-5742-934:4(389-404)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s10836-018-5742-9
Senkul POnder NOnder SMaden ENyew H(2012)Discovering Patterns for Architecture Simulation by Using Sequence MiningPattern Discovery Using Sequence Data Mining10.4018/978-1-61350-056-9.ch013(212-236)Online publication date: 2012
https://doi.org/10.4018/978-1-61350-056-9.ch013

Index Terms

Improving single-thread performance with fine-grain state maintenance
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
    2. Serial architectures

Recommendations

Fast branch misprediction recovery in out-of-order superscalar processors
ICS '05: Proceedings of the 19th annual international conference on Supercomputing

Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance. However, as pipeline depth increases, the branch misprediction penalty becomes a critical factor in ...
Improving Single-Thread Fetch Performance on a Multithreaded Processor
DSD '01: Proceedings of the Euromicro Symposium on Digital Systems Design

Abstract: Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rates than single-thread processors. On a multi-thread workload, a ...
Kilo-instruction processors, runahead and prefetching
CF '06: Proceedings of the 3rd conference on Computing frontiers

There is a continuous research effort devoted to overcome the memory wall problem. Prefetching is one of the most frequently used techniques. A prefetch mechanism anticipates the processor requests by moving data into the lower levels of the memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '08: Proceedings of the 5th conference on Computing frontiers

May 2008

334 pages

ISBN:9781605580777

DOI:10.1145/1366230

General Chair:
Alex Ramirez
UPC, Spain
,
Program Chairs:
Gianfranco Biliardi
University of Padova, Italy
,
Michael Gschwind
IBM TJ Watson Research Center, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 May 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CF '08

Sponsor:

CF '08: Computing Frontiers Conference

May 5 - 7, 2008

Ischia, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
180
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jothin RVasanthanayaki C(2018)High Performance Static Segment On-Chip Memory for Image Processing ApplicationsJournal of Electronic Testing: Theory and Applications10.1007/s10836-018-5742-934:4(389-404)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s10836-018-5742-9
Senkul POnder NOnder SMaden ENyew H(2012)Discovering Patterns for Architecture Simulation by Using Sequence MiningPattern Discovery Using Sequence Data Mining10.4018/978-1-61350-056-9.ch013(212-236)Online publication date: 2012
https://doi.org/10.4018/978-1-61350-056-9.ch013

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten