Slipstream processors: improving both performance and fault tolerance

Authors:
Karthik Sundaramoorthy

North Carolina State University, Department of Electrical and Computer Engineering, Engineering Graduate Research Center, Campus Box 7914, Raleigh, NC

North Carolina State University, Department of Electrical and Computer Engineering, Engineering Graduate Research Center, Campus Box 7914, Raleigh, NC
View Profile

,
Zach Purser

North Carolina State University, Department of Electrical and Computer Engineering, Engineering Graduate Research Center, Campus Box 7914, Raleigh, NC

North Carolina State University, Department of Electrical and Computer Engineering, Engineering Graduate Research Center, Campus Box 7914, Raleigh, NC
View Profile

,
Eric Rotenburg

North Carolina State University, Department of Electrical and Computer Engineering, Engineering Graduate Research Center, Campus Box 7914, Raleigh, NC

North Carolina State University, Department of Electrical and Computer Engineering, Engineering Graduate Research Center, Campus Box 7914, Raleigh, NC
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 28 Issue 5Dec. 2000pp 257–268https://doi.org/10.1145/378995.379247

Published:12 November 2000Publication History

ACM SIGARCH Computer Architecture News

Abstract

Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the original program by removing ineffectual computation and computation related to highly-predictable control flow. The shortened program is run concurrently with the full program on a chip multiprocessor simultaneous multithreaded processor, with two key advantages:1) Improved single-program performance. The shorter program speculatively runs ahead of the full program and supplies the full program with control and data flow outcomes. The full program executes efficiently due to the communicated outcomes, at the same time validating the speculative, shorter program. The two programs combined run faster than the original program alone. Detailed simulations of an example implementation show an average improvement of 7% for the SPEC95 integer benchmarks.2) Fault tolerance. The shorter program is a subset of the full program and this partial-redundancy is transparently leveraged for detecting and recovering from transient hardware faults.

References

1 H. Akkary and M. Driscoll. A Dynamic Multithreading Processor. 31st Int'l Symp. on Microarchitecture, Dec. 1998. Google ScholarDigital Library
2 T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. 32nd Int'l Symp. on Microarchitecture, Nov. 1999. Google ScholarDigital Library
3 D. Burger, T. Austin, and S. Bennett. Evaluating Future Microprocessors: The Simplescalar Toolset. Technical Report CS-TR-96-1308, Computer Sciences Department, University of Wisconsin - Madison, July 1996.Google Scholar
4 D. Burger, S. Kaxiras, and J. Goodman. DataScalar Architectures. 24th Int'l Symp. on Computer Architecture, June 1997. Google ScholarDigital Library
5 R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous Subordinate Microthreading (SSMT). 26th Int'l Symp. on Computer Architecture, May 1999. Google ScholarDigital Library
6 D. Connors and W.-M. Hwu. Compiler-Directed Dynamic Computation Reuse: Rationale and Initial Results. 32nd Int'l Symp. on Microarchitecture,Nov. 1999. Google ScholarDigital Library
7 P. Dubey, K. O'Brien, K. M. O'Brien, and C. Barton. Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-Assisted Fine-Grained Multithreading. Parallel Architectures and Compiler Techniques, June 1995 Google ScholarDigital Library
8 A. Farcy, O. Temam, R. Espasa, and T. Juan. Dataflow Analysis of Branch Mispredictions and its Application to Early Resolution of Branch Outcomes. 31st Int'l Symp. on Microarchitecture, Dec. 1998. Google ScholarDigital Library
9 A. Gonzalez, J. Tubella, and C. Molina. Trace-Level Reuse. Int'l Conf. on Parallel Processing, Sep. 1999. Google ScholarDigital Library
10 J. Huang and D. Lilja. Exploiting Basic Block Value Locality with Block Reuse. 5th Int'l Symp. on High-Performance Computer Architecture, Jan. 1999. Google ScholarDigital Library
11 R. Iyer, A. Avizienis, D. Barron, D. Powell, H. Levendel, and J. Samson. Panel: Using COTS to Design Dependable Networked Systems. 29th Int'l Symp. on Fault-Tolerant Computing, June 1999.Google Scholar
12 E. Jacobsen, E. Rotenberg, and J. Smith. Assigning Confidence to Conditional Branch Predictions. 29th Int'l Symp. on Microarchitecture, Dec. 1996. Google ScholarDigital Library
13 Q. Jacobson, E. Rotenberg, and J. Smith. Path- Based Next Trace Prediction. 30th Int'l Symp. on Microarchitecture, Dec. 1997. Google ScholarDigital Library
14 S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz. A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification. 31st Int'l Symp. on Microarchitecture, Nov. 1998. Google ScholarDigital Library
15 K. Lepak and M. Lipasti. On the Value Locality of Store Instructions. 27th Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library
16 M. Lipasti, C. Wilkerson, and J. Shen. Value Locality and Load Value Prediction. 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarDigital Library
17 M. Lipasti. Value Locality and Speculative Execution. Ph.D. Thesis, Carnegie Mellon University, April 1997. Google ScholarDigital Library
18 M. Martin, A. Roth, and C. Fischer. Exploiting Dead Value Information. 30th Int'l Symp. on Microarchitecture, Dec. 1997. Google ScholarDigital Library
19 C. Molina, A. Gonzalez, and J. Tubella. Reducing Memory Traffic via Redundant Store Instructions. HPCN Europe, 1999. Google ScholarDigital Library
20 K. Olukotun, B. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang. The Case for a Single-Chip Multiprocessor. 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarDigital Library
21 J. Oplinger, D. Heine, S.-W. Liao, B. Nayfeh, M. Lam, and K. Olukotun. Software and Hardware for Exploiting Speculative Parallelism in Multiprocessors. CSL-TR-97-715, Stanford University, Feb. 1997. Google ScholarDigital Library
22 S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. 27th Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library
23 D. Ronfeldt. Social Science at 190 MPH on NASCAR's Biggest Superspeedways. First Monday Journal (on-line), Vol. 5 No. 2, Feb. 7, 2000.Google Scholar
24 E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. 29th Int'l Symp. on Fault-Tolerant Computing, June 1999. Google ScholarDigital Library
25 E. Rotenberg. Exploiting Large Ineffectual Instruction Sequences. Technical Report, Department of Electrical and Computer Engineering, North Carolina State University, Nov. 1999.Google Scholar
26 A. Roth, A. Moshovos, and G. Sohi. Dependence Based Prefetching for Linked Data Structures. 8th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1998. Google ScholarDigital Library
27 A. Roth and G. Sohi. Speculative Data Driven Sequencing for Imperative Programs. Technical Report CS-TR-2000-1411, Computer Sciences Department, University of Wisconsin - Madison, Feb. 2000.Google Scholar
28 A. Roth and G. Sohi. Speculative Data-Driven Multithreading. Technical Report CS-TR-2000-1414, Computer Sciences Department, University of Wisconsin - Madison, April 2000.Google Scholar
29 P. Rubinfeld. Virtual Roundtable on the Challenges and Trends in Processor Design: Managing Problems at High Speeds. Computer, 31(1):47-48, Jan. 1998. Google ScholarDigital Library
30 Y. Sazeides and J. E. Smith. Modeling Program Predictability. 25th Int'l Symp. on Computer Architecture, June 1998. Google ScholarDigital Library
31 A. Sodani and G. S. Sohi. Dynamic Instruction Reuse. 24th Int'l Symp. on Computer Architecture, June 1997. Google ScholarDigital Library
32 A. Sodani and G. S. Sohi. An Empirical Analysis of Instruction Repetition. 8th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1998. Google ScholarDigital Library
33 G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. 22nd Int'l Symp. on Computer Architecture, June 1995. Google ScholarDigital Library
34 J. Steffan and T. Mowry. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. 4th Int'l Symp. on High-Performance Computer Architecture, Feb. 1998. Google ScholarDigital Library
35 J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. Parallel Architectures and Compiler Techniques, 1996. Google ScholarDigital Library
36 D. Tullsen, S. Eggers, and H. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. 22nd Int'l Symp. on Computer Architecture, June 1995. Google ScholarDigital Library
37 D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. 23rd Int'l Symp. on Computer Architecture, May 1996. Google ScholarDigital Library
38 D. Tullsen and J. Seng. Storageless Value Prediction Using Prior Register Values. 26th Int'l Symp. on Computer Architecture, May 1999. Google ScholarDigital Library
39 W. Yamamoto and M. Nemirovsky. Increasing Superscalar Performance through Multistreaming. Parallel Architectures and Compilation Techniques, June 1995. Google ScholarDigital Library
40 C. Zilles, J. Emer, and G. Sohi. The Use of Multithreading for Exception Handling. 32nd Int'l Symp. on Microarchitecture, Nov. 1999. Google ScholarDigital Library
41 C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. 27th Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library

Index Terms

Slipstream processors: improving both performance and fault tolerance

Recommendations

Slipstream processors: improving both performance and fault tolerance
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems

Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the ...
Read More
Slipstream processors: improving both performance and fault tolerance

Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the ...
Read More
Slipstream processors: improving both performance and fault tolerance

Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 28, Issue 5
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)
Dec. 2000
269 pages
ISSN:0163-5964
DOI:10.1145/378995
Editor:
Doug DeGroot
Dallas, TX
Issue’s Table of Contents
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
November 2000
271 pages
ISBN:1581133170
DOI:10.1145/378993
Chairmen:
Larry Rudolph
MIT, Cambridge, MA
,
Anoop Gupta
Microsoft
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 319
  Total Citations
  View Citations
- 1,139
  Total Downloads
- Downloads (Last 12 months)117
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Slipstream processors: improving both performance and fault tolerance

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Slipstream processors: improving both performance and fault tolerance

Slipstream processors: improving both performance and fault tolerance

Slipstream processors: improving both performance and fault tolerance