ACM Home Page
Please provide us with feedback. Feedback
Efficient emulation of hardware prefetchers via event-driven helper threading
Full text PdfPdf (422 KB)
Source PACT archive
Proceedings of the 15th international conference on Parallel architectures and compilation techniques table of contents
Seattle, Washington, USA
SESSION: Multi-core design II table of contents
Pages: 144 - 153  
Year of Publication: 2006
ISBN:1-59593-264-X
Authors
Ilya Ganusov  Cornell University, Ithaca, New York
Martin Burtscher  Cornell University, Ithaca, New York
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 77,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1152154.1152178
What is a DOI?

ABSTRACT

The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even suffer a penalty because of the increased contention for shared resources. This paper explores the idea of using available general-purpose cores in a CMP as helper engines for individual threads running on the active cores. We propose a lightweight architectural framework for efficient event-driven software emulation of complex hardware accelerators and describe how this framework can be applied to implement a variety of prefetching techniques. We demonstrate the viability and effectiveness of our framework on a wide range of applications from the SPEC CPU2000 and Olden benchmark suites. On average, our mechanism provides performance benefits within 5% of pure hardware implementations. Furthermore, we demonstrate that running event-driven prefetching threads on top of a baseline with a hardware stride prefetcher yields significant speedups for many programs. Finally, we show that our approach provides competitive performance improvements over other hardware approaches for multi-core execution while executing fewer instructions and requiring considerably less hardware support.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
 
5
 
6
7
8
9
 
10
E. Larson, S. Chatterjee, and T. Austin. Mase: a novel infrastructure for detailed microarchitectural modeling. In Proc. Second Intl. Symp. on Performance Analysis of Systems and Software, 2001.
11
12
13
14
 
15
 
16
17
 
18
 
19
 
20
 
21
P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Tech. report WRL-2001-2, Compaq Western Research Laboratory, December 2001.
22
 
23
 
24
25
 
26

Collaborative Colleagues:
Ilya Ganusov: colleagues
Martin Burtscher: colleagues