| Speculative execution for hiding memory latency |
| Full text |
Pdf
(233 KB)
|
| Source
|
ACM SIGARCH Computer Architecture News
archive
Volume 33 , Issue 3 (June 2005)
table of contents
Special issue: MEDEA 2004 workshop
SPECIAL ISSUE: MEDEA 2004 workshop
table of contents
Pages: 49 - 56
Year of Publication: 2005
ISSN:0163-5964
Also published in ...
|
|
Authors
|
|
Alex Pajuelo
|
Universitat Politècnica de Catalunya, Barcelona-Spain
|
|
Antonio González
|
Universitat Politècnica de Catalunya, Barcelona-Spain
|
|
Mateo Valero
|
Universitat Politècnica de Catalunya, Barcelona-Spain
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 35, Citation Count: 0
|
|
|
ABSTRACT
L2 misses are one of the main causes for stalling the activity in current and future microprocessors.In this paper we present a mechanism to speculatively execute independent instructions of L2-miss loads, even if no entry in the reorder buffer is available. The proposed mechanism generates future instances of instructions that are expected to be independent of the delinquent load. When these dynamic instructions are later fetched, they use the previously precomputed data and directly go to the commit stage without executing.The mechanism replicates strided loads found above the L2-miss load, that produce the data for the target independent instructions. Instructions following the L2-miss load will check if their source operands have been replicated. In this case, multiple speculative instances of them will also be generated.This mechanism is built on top of a superscalar processor with an aggressive prefetch scheme. Compared to this baseline, the mechanism obtains 21% of performance improvement.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
D. Burger and T. Austin, "The SimpleScalar Tool Set, Version 2.0", Technical Report No. CS-TR-97-1342, University of Wisconsin-Madison, June 1997.
|
 |
6
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
 |
7
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
8
|
|
| |
9
|
|
 |
10
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, p.59, May 25-29, 2002, Anchorage, Alaska
|
| |
16
|
David López , Josep Llosa , Mateo Valero , Eduard Ayguadé, Widening resources: a cost-effective technique for aggressive ILP architectures, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.237-246, November 1998, Dallas, Texas, United States
|
 |
17
|
|
| |
18
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
 |
19
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
20
|
|
 |
21
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
22
|
Jude A. Rivers , Gary S. Tyson , Edward S. Davidson , Todd M. Austin, On high-bandwidth data cache design for multi-issue processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.46-56, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
23
|
SPEC 2000. http://www.specbench.org/osg/cpu2000/
|
| |
24
|
|
 |
25
|
|
|