Abstract
Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the original program by removing ineffectual computation and computation related to highly-predictable control flow. The shortened program is run concurrently with the full program on a chip multiprocessor simultaneous multithreaded processor, with two key advantages:1) Improved single-program performance. The shorter program speculatively runs ahead of the full program and supplies the full program with control and data flow outcomes. The full program executes efficiently due to the communicated outcomes, at the same time validating the speculative, shorter program. The two programs combined run faster than the original program alone. Detailed simulations of an example implementation show an average improvement of 7% for the SPEC95 integer benchmarks.2) Fault tolerance. The shorter program is a subset of the full program and this partial-redundancy is transparently leveraged for detecting and recovering from transient hardware faults.
- 1 H. Akkary and M. Driscoll. A Dynamic Multithreading Processor. 31st Int'l Symp. on Microarchitecture, Dec. 1998. Google ScholarDigital Library
- 2 T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. 32nd Int'l Symp. on Microarchitecture, Nov. 1999. Google ScholarDigital Library
- 3 D. Burger, T. Austin, and S. Bennett. Evaluating Future Microprocessors: The Simplescalar Toolset. Technical Report CS-TR-96-1308, Computer Sciences Department, University of Wisconsin - Madison, July 1996.Google Scholar
- 4 D. Burger, S. Kaxiras, and J. Goodman. DataScalar Architectures. 24th Int'l Symp. on Computer Architecture, June 1997. Google ScholarDigital Library
- 5 R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous Subordinate Microthreading (SSMT). 26th Int'l Symp. on Computer Architecture, May 1999. Google ScholarDigital Library
- 6 D. Connors and W.-M. Hwu. Compiler-Directed Dynamic Computation Reuse: Rationale and Initial Results. 32nd Int'l Symp. on Microarchitecture,Nov. 1999. Google ScholarDigital Library
- 7 P. Dubey, K. O'Brien, K. M. O'Brien, and C. Barton. Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-Assisted Fine-Grained Multithreading. Parallel Architectures and Compiler Techniques, June 1995 Google ScholarDigital Library
- 8 A. Farcy, O. Temam, R. Espasa, and T. Juan. Dataflow Analysis of Branch Mispredictions and its Application to Early Resolution of Branch Outcomes. 31st Int'l Symp. on Microarchitecture, Dec. 1998. Google ScholarDigital Library
- 9 A. Gonzalez, J. Tubella, and C. Molina. Trace-Level Reuse. Int'l Conf. on Parallel Processing, Sep. 1999. Google ScholarDigital Library
- 10 J. Huang and D. Lilja. Exploiting Basic Block Value Locality with Block Reuse. 5th Int'l Symp. on High-Performance Computer Architecture, Jan. 1999. Google ScholarDigital Library
- 11 R. Iyer, A. Avizienis, D. Barron, D. Powell, H. Levendel, and J. Samson. Panel: Using COTS to Design Dependable Networked Systems. 29th Int'l Symp. on Fault-Tolerant Computing, June 1999.Google Scholar
- 12 E. Jacobsen, E. Rotenberg, and J. Smith. Assigning Confidence to Conditional Branch Predictions. 29th Int'l Symp. on Microarchitecture, Dec. 1996. Google ScholarDigital Library
- 13 Q. Jacobson, E. Rotenberg, and J. Smith. Path- Based Next Trace Prediction. 30th Int'l Symp. on Microarchitecture, Dec. 1997. Google ScholarDigital Library
- 14 S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz. A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification. 31st Int'l Symp. on Microarchitecture, Nov. 1998. Google ScholarDigital Library
- 15 K. Lepak and M. Lipasti. On the Value Locality of Store Instructions. 27th Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library
- 16 M. Lipasti, C. Wilkerson, and J. Shen. Value Locality and Load Value Prediction. 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarDigital Library
- 17 M. Lipasti. Value Locality and Speculative Execution. Ph.D. Thesis, Carnegie Mellon University, April 1997. Google ScholarDigital Library
- 18 M. Martin, A. Roth, and C. Fischer. Exploiting Dead Value Information. 30th Int'l Symp. on Microarchitecture, Dec. 1997. Google ScholarDigital Library
- 19 C. Molina, A. Gonzalez, and J. Tubella. Reducing Memory Traffic via Redundant Store Instructions. HPCN Europe, 1999. Google ScholarDigital Library
- 20 K. Olukotun, B. Nayfeh, L. Hammond, K. Wilson, and K.-Y. Chang. The Case for a Single-Chip Multiprocessor. 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1996. Google ScholarDigital Library
- 21 J. Oplinger, D. Heine, S.-W. Liao, B. Nayfeh, M. Lam, and K. Olukotun. Software and Hardware for Exploiting Speculative Parallelism in Multiprocessors. CSL-TR-97-715, Stanford University, Feb. 1997. Google ScholarDigital Library
- 22 S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. 27th Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library
- 23 D. Ronfeldt. Social Science at 190 MPH on NASCAR's Biggest Superspeedways. First Monday Journal (on-line), Vol. 5 No. 2, Feb. 7, 2000.Google Scholar
- 24 E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. 29th Int'l Symp. on Fault-Tolerant Computing, June 1999. Google ScholarDigital Library
- 25 E. Rotenberg. Exploiting Large Ineffectual Instruction Sequences. Technical Report, Department of Electrical and Computer Engineering, North Carolina State University, Nov. 1999.Google Scholar
- 26 A. Roth, A. Moshovos, and G. Sohi. Dependence Based Prefetching for Linked Data Structures. 8th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1998. Google ScholarDigital Library
- 27 A. Roth and G. Sohi. Speculative Data Driven Sequencing for Imperative Programs. Technical Report CS-TR-2000-1411, Computer Sciences Department, University of Wisconsin - Madison, Feb. 2000.Google Scholar
- 28 A. Roth and G. Sohi. Speculative Data-Driven Multithreading. Technical Report CS-TR-2000-1414, Computer Sciences Department, University of Wisconsin - Madison, April 2000.Google Scholar
- 29 P. Rubinfeld. Virtual Roundtable on the Challenges and Trends in Processor Design: Managing Problems at High Speeds. Computer, 31(1):47-48, Jan. 1998. Google ScholarDigital Library
- 30 Y. Sazeides and J. E. Smith. Modeling Program Predictability. 25th Int'l Symp. on Computer Architecture, June 1998. Google ScholarDigital Library
- 31 A. Sodani and G. S. Sohi. Dynamic Instruction Reuse. 24th Int'l Symp. on Computer Architecture, June 1997. Google ScholarDigital Library
- 32 A. Sodani and G. S. Sohi. An Empirical Analysis of Instruction Repetition. 8th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, Oct. 1998. Google ScholarDigital Library
- 33 G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. 22nd Int'l Symp. on Computer Architecture, June 1995. Google ScholarDigital Library
- 34 J. Steffan and T. Mowry. The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization. 4th Int'l Symp. on High-Performance Computer Architecture, Feb. 1998. Google ScholarDigital Library
- 35 J.-Y. Tsai and P.-C. Yew. The Superthreaded Architecture: Thread Pipelining with Run-time Data Dependence Checking and Control Speculation. Parallel Architectures and Compiler Techniques, 1996. Google ScholarDigital Library
- 36 D. Tullsen, S. Eggers, and H. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. 22nd Int'l Symp. on Computer Architecture, June 1995. Google ScholarDigital Library
- 37 D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. 23rd Int'l Symp. on Computer Architecture, May 1996. Google ScholarDigital Library
- 38 D. Tullsen and J. Seng. Storageless Value Prediction Using Prior Register Values. 26th Int'l Symp. on Computer Architecture, May 1999. Google ScholarDigital Library
- 39 W. Yamamoto and M. Nemirovsky. Increasing Superscalar Performance through Multistreaming. Parallel Architectures and Compilation Techniques, June 1995. Google ScholarDigital Library
- 40 C. Zilles, J. Emer, and G. Sohi. The Use of Multithreading for Exception Handling. 32nd Int'l Symp. on Microarchitecture, Nov. 1999. Google ScholarDigital Library
- 41 C. Zilles and G. Sohi. Understanding the Backward Slices of Performance Degrading Instructions. 27th Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library
Index Terms
- Slipstream processors: improving both performance and fault tolerance
Recommendations
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systemsProcessors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the ...
Slipstream processors: improving both performance and fault tolerance
Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the ...
Slipstream processors: improving both performance and fault tolerance
Processors execute the full dynamic instruction stream to arrive at the final output of a program, yet there exist shorter instruction streams that produce the same overall effect. We propose creating a shorter but otherwise equivalent version of the ...
Comments