| Efficient emulation of hardware prefetchers via event-driven helper threading |
| Full text |
Pdf
(422 KB)
|
| Source
|
PACT
archive
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
table of contents
Seattle, Washington, USA
SESSION: Multi-core design II
table of contents
Pages: 144 - 153
Year of Publication: 2006
ISBN:1-59593-264-X
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 77, Citation Count: 0
|
|
|
ABSTRACT
The advance of multi-core architectures provides significant benefits for parallel and throughput-oriented computing, but the performance of individual computation threads does not improve and may even suffer a penalty because of the increased contention for shared resources. This paper explores the idea of using available general-purpose cores in a CMP as helper engines for individual threads running on the active cores. We propose a lightweight architectural framework for efficient event-driven software emulation of complex hardware accelerators and describe how this framework can be applied to implement a variety of prefetching techniques. We demonstrate the viability and effectiveness of our framework on a wide range of applications from the SPEC CPU2000 and Olden benchmark suites. On average, our mechanism provides performance benefits within 5% of pure hardware implementations. Furthermore, we demonstrate that running event-driven prefetching threads on top of a baseline with a hardware stride prefetcher yields significant speedups for many programs. Finally, we show that our approach provides competitive performance improvements over other hardware approaches for multi-core execution while executing fewer instructions and requiring considerably less hardware support.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
John W. C. Fu , Janak H. Patel , Bob L. Janssens, Stride directed prefetching in scalar processors, Proceedings of the 25th annual international symposium on Microarchitecture, p.102-110, December 01-04, 1992, Portland, Oregon, United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
E. Larson, S. Chatterjee, and T. Austin. Mase: a novel infrastructure for detailed microarchitectural modeling. In Proc. Second Intl. Symp. on Performance Analysis of Systems and Software, 2001.
|
 |
11
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Tech. report WRL-2001-2, Compaq Western Research Laboratory, December 2001.
|
 |
22
|
|
| |
23
|
|
| |
24
|
Perry H. Wang , Jamison D. Collins , Hong Wang , Dongkeun Kim , Bill Greene , Kai-Ming Chan , Aamir B. Yunus , Terry Sych , Stephen F. Moore , John P. Shen, Helper Threads via Virtual Multithreading, IEEE Micro, v.24 n.6, p.74-82, November 2004
[doi> 10.1109/MM.2004.75]
|
 |
25
|
|
| |
26
|
|
|