skip to main content
10.1145/1366230.1366274acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Improving single-thread performance with fine-grain state maintenance

Published: 05 May 2008 Publication History

Abstract

We show that a multi-threaded processor that is aware of the processor state in a fine-grain manner can improve single-thread performance significantly by assigning the task of maintaining the correct processor state to an independent thread. We develop fine-grain state maintenance techniques that can be applied in multi-threaded environments and present a fine-grain state application of runahead execution where the data values dependent on a missed load are treated as damaged values. These values are verified and recovered as necessary by an independent thread. We evaluate an SMT-like fine grain state processor and show that it obtains an average of 38.9% and up to 160.0% better performance than coarse-grain baseline processors on the SPEC CFP2000 benchmark suite.

References

[1]
Chen-Yong Cher and T. N. Vijaykumar.Skipper: a microarchitecture for exploiting control-flow independence.In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 4--15, Washington, DC, USA, 2001. IEEE Computer Society.
[2]
Yuan C. Chou, Jason Fung, and John Paul Shen.Reducing branch misprediction penalties via dynamic control independence detection.In Proceedings of the 13th ACM International Conference on Supercomputing, pages 109--118, 1999.
[3]
George Z. Chrysos and Joel S. Emer.Memory dependence prediction using store sets.In Proceedings of the 25th International Conference on Computer Architecture, pages 142--153, June 1998.
[4]
James Dundas and Trevor Mudge.Improving data cache performance by pre-executing instructions under a cache miss.In Proceedings of the 1997 ACM International Conference on Supercomputing, pages 68--75, Vienna, Austria, July 1997.
[5]
Amit Gandhi, Haitham Akkary, and Srikanth T. Srinivasan.Reducing branch misprediction penalty via selective branch recovery.Proceedings of the 10th International Symposium on High-Performance Computer Architecture, pages 254--264, February 2004.
[6]
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel.The microarchitecture of the pentium 4 processor.In Intel Technology Journal, February 2001
[7]
W. W. Hwu and Y. N. Patt.Checkpoint repair for out-of-order execution machines.In Proceedings of the 14th Annual International Symposium on Computer Architecture, pages 18--26, June 1987.
[8]
Mike Johnson.Superscalar Microprocessor Design.Prentice Hall, 1991.
[9]
Tejas Karkhanis and J. E. Smith. A day in the life of a data cache miss.In Workshop on Memory Performance Issues, Anchorage, AK, May 2002.
[10]
AJ KleinOsowski and David J. Lilja.Minnespec: A new spec benchmark workload for simulation-based computer architecture research.Computer Architecture Letters, Volume 1, June 2002.
[11]
Onur Mutlu, Hyesoon Kim, and Yale N. Patt.On reusing the results of pre-executed instructions in a runahead execution processor. In IEEE Computer Architecture Letters (CAL), volume 4, Washington, DC, USA, January 2005. IEEE Computer Society.
[12]
Onur Mutlu, Hyesoon Kim, and Yale N. Patt.Techniques for efficient processing in runahead execution engines.In Proceedings of the 32st International Symposium on Computer Architecture, pages 370--381, Madison, WI, June 2005.
[13]
Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt.Runahead execution: An effective alternative to large instruction windows.IEEE Micro, 23(6):20--25, 2003.
[14]
Soner Õnder and Rajiv Gupta.Automatic generation of microarchitecture simulators.In IEEE International Conference on Computer Languages, pages 80--89, Chicago, May 1998.
[15]
Soner Õnder and Rajiv Gupta.Dynamic memory disambiguation in the presence of out-of-order store issuing.In 32nd Annual IEEE-ACM International Symposium on Microarchitecture, pages 170 -- 176, November 1999.
[16]
Zach Purser, Karthik Sundaramoorthy, and Eirc Rotenberg.A study of slipstream processors.In Proceedings of the 33th Annual IEEE/ACM International Symposium on Microarchitecture, pages 269--280, Monterey, CA, December 2000.
[17]
Amir Roth and Gurindar S. Sohi. Register integration: a simple and efficient implementation of squash reuse.In Proceedings of the 33th Annual IEEE/ACM International Symposium on Microarchitecture, pages 223--234, Monterey, CA, December 2000.
[18]
Smruti R. Sarangi, Wei Liu, Josep Torrellas, and Yuanyuan Zhou.Reslice: Selective re-execution of long-retired misspeculated instructions using forward slicing.In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05), Barcelona, Spain, November 2005. IEEE Computer Society.
[19]
James E. Smith and Andrew R. Pleszkun.Implementing precise interrupts in pipelined processors.IEEE Trans. Computers, 37(5):562--573, 1988.
[20]
Avinash Sodani and Gurindar S. Sohi.Dynamic instruction reuse. In Proceedings of the 24th International Conference on Computer Architecture, 1997.
[21]
Srikanth T. Srinivasan, Ravi Rajwar, Haitham Akkary, Amit Gandhi, and Mike Upton.Continual flow pipelines.In ASPLOS-XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, pages 107--119, New York, NY, USA, 2004. ACM Press.
[22]
Karthik Sundaramoorthy, Zach Purser, and Eirc Rotenberg.Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, 2000.
[23]
Theo Ungerer, Borut Robic, and Jurij Silc.A survey of processors with explicit multithreading.In ACM Computing Surveys, volume 35, pages 29--63. ACM, March 2003.
[24]
Huiyang Zhou.Dual-core execution: Building a highly scalable single-thread instruction window.In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), pages 231--242, Washington, DC, 2005. IEEE Computer Society.
[25]
Peng Zhou, S. Õnder, and Steve Carr.Fast branch misprediction recovery in out-of-order superscalar processors.In Proceedings of the 2005 ACM International Conference on Supercomputing, pages 41--50, Boston, MA, June 2005

Cited By

View all
  • (2018)High Performance Static Segment On-Chip Memory for Image Processing ApplicationsJournal of Electronic Testing: Theory and Applications10.1007/s10836-018-5742-934:4(389-404)Online publication date: 1-Aug-2018
  • (2012)Discovering Patterns for Architecture Simulation by Using Sequence MiningPattern Discovery Using Sequence Data Mining10.4018/978-1-61350-056-9.ch013(212-236)Online publication date: 2012

Index Terms

  1. Improving single-thread performance with fine-grain state maintenance

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CF '08: Proceedings of the 5th conference on Computing frontiers
      May 2008
      334 pages
      ISBN:9781605580777
      DOI:10.1145/1366230
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 May 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. checkpoint
      2. processor state
      3. recovery
      4. runahead
      5. simultaneous multi-threading

      Qualifiers

      • Research-article

      Conference

      CF '08
      Sponsor:
      CF '08: Computing Frontiers Conference
      May 5 - 7, 2008
      Ischia, Italy

      Acceptance Rates

      Overall Acceptance Rate 273 of 785 submissions, 35%

      Upcoming Conference

      CF '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)High Performance Static Segment On-Chip Memory for Image Processing ApplicationsJournal of Electronic Testing: Theory and Applications10.1007/s10836-018-5742-934:4(389-404)Online publication date: 1-Aug-2018
      • (2012)Discovering Patterns for Architecture Simulation by Using Sequence MiningPattern Discovery Using Sequence Data Mining10.4018/978-1-61350-056-9.ch013(212-236)Online publication date: 2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media