ACM Home Page
Please provide us with feedback. Feedback
Slice-processors: an implementation of operation-based prediction
Full text PdfPdf (237 KB)
Source International Conference on Supercomputing archive
Proceedings of the 15th international conference on Supercomputing table of contents
Sorrento, Italy
Pages: 321 - 334  
Year of Publication: 2001
ISBN:1-58113-410-X
Authors
Andreas Moshovos  Electrical and Computer Engineering, University of Toronto
Dionisios N. Pnevmatikatos  Electronic and Computer Engineering, Technical University of Crete
Amirali Baniasadi  Electrical and Computer Engineering, Northwestern University
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 38,   Citation Count: 27
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/377792.377856
What is a DOI?

ABSTRACT

We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming memory references. This is in contrast to outcome-based predictors that exploit regularities in the (address) outcome stream. Slice processors are a generalization of existing operation-based prefetching mechanisms such as stream buffers where the operation itself is fixed in the design (e.g., address + stride). A slice processor dynamically identifies frequently missing loads and extracts on-the-fly the relevant address computation slices. Such slices are then executed in-parallel with the main sequential thread prefetching memory data. We describe the various support structures and emphasize the design of the slice detection mechanism. We demonstrate that a relatively simple organization can significantly improve performance over an aggressive, dynamically-scheduled processor and for a set of pointer-intensive programs and for some integer applications from the SPEC'95 suite. In particular, a slice processor that can detect slices of up to 8 instructions extracted over of a region of up to 32 instructions improves performance by 11% on the average (even if slice detection requires up to 32 cycles). Allowing slices of up to 16 instructions results in an average performance improvement of 15%. Finally, we study how our operation-based predictor interacts with an outcome-based one and find them mutually beneficial.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
5
 
6
7
 
8
9
10
 
11
12
 
13
Y. Song and M. Dubois. Assisted execution. Technical report, Technical Report CENG-98-25, Department of EE-Systems, University of Southern California, Oct. 1998.
14
15
16

CITED BY  27
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Collaborative Colleagues:
Andreas Moshovos: colleagues
Dionisios N. Pnevmatikatos: colleagues
Amirali Baniasadi: colleagues

Peer to Peer - Readers of this Article have also read: