| A hardware mechanism for dynamic extraction and relayout of program hot spots |
| Full text |
Pdf
(320 KB)
|
| Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 27th annual international symposium on Computer architecture
table of contents
Vancouver, British Columbia, Canada
Pages: 59 - 70
Year of Publication: 2000
ISBN:1-58113-232-8
Also published in ...
|
|
Authors
|
|
Matthew C. Merten
|
Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
|
|
Andrew R. Trick
|
Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
|
|
Erik M. Nystrom
|
Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
|
|
Ronald D. Barnes
|
Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
|
|
Wen-mei W. Hmu
|
Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 18, Citation Count: 10
|
|
|
ABSTRACT
This paper presents a new mechanism for collecting and deploying runtime optimized code. The code-collecting component resides in the instruction retirement stage and lays out hot execution paths to improve instruction fetch rate as well as enable further code optimization. The code deployment component uses an extension to the Branch Target Buffer to migrate execution into the new code without modifying the original code. No significant delay is added to the total execution of the program due to these components. The code collection scheme enables safe runtime optimization along paths that span function boundaries. This technique provides a better platform for runtime optimization than trace caches, because the traces are longer and persist in main memory across context switches. Additionally, these traces are not as susceptible to transient behavior because they are restricted to frequently executed code. Empirical results show that on average this mechanism can achieve better instruction fetch rates using only 12KB of hardware than a trace cache requiring 15KB of hardware, while producing long, persistent traces more suited to optimization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
V. Bala, E. Duesterwald, and S. Banerjia. Transparent dynamic optimization: The design and implementation of dynamo. Technical Report HPL-1999-78, Hewlett-Packard Laboratories Cambridge, June 1999.
|
 |
2
|
Thomas M. Conte , Kishore N. Menezes , Patrick M. Mills , Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proceedings of the 22nd annual international symposium on Computer architecture, p.333-344, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
 |
7
|
Matthew C. Merten , Andrew R. Trick , Christopher N. George , John C. Gyllenhaal , Wen-mei W. Hwu, A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization, Proceedings of the 26th annual international symposium on Computer architecture, p.136-147, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
8
|
|
 |
9
|
|
 |
10
|
Alex Ramírez , Josep-L. Larriba-Pey , Carlos Navarro , Josep Torrellas , Mateo Valero, Software trace cache, Proceedings of the 13th international conference on Supercomputing, p.119-126, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305178]
|
| |
11
|
|
CITED BY 10
|
Sanjay J. Patel , Tony Tung , Satarupa Bose , Matthew M. Crum, Increasing the size of atomic instruction blocks using control flow assertions, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.303-313, December 2000, Monterey, California, United States
|
|
|
Ravi Bhargava , Juan Rubio , Srikanth Kannan , Lizy K. John , David Christie , Leo Klaes, Understanding the impact of X86/NT computing on microarchitecture, Workload characterization of emerging computer applications, Kluwer Academic Publishers, Norwell, MA, 2001
|
|
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
Lucian Popa , Irina Athanasiu , Costin Raiciu , Raju Pandey , Radu Teodorescu, Using code collection to support large applications on mobile devices, Proceedings of the 10th annual international conference on Mobile computing and networking, September 26-October 01, 2004, Philadelphia, PA, USA
|
|
|
|
|
Tipp Moseley , Alex Shye , Vijay Janapa Reddi , Matthew Iyer , Dan Fay , David Hodgdon , Joshua L. Kihm , Alex Settle , Dirk Grunwald , Daniel A. Connors, Dynamic run-time architecture techniques for enabling continuous optimization, Proceedings of the 2nd conference on Computing frontiers, May 04-06, 2005, Ischia, Italy
|
|
|
Howard Chen , Wei-Chung Hsu , Jiwei Lu , Pen-Chung Yew , Dong-Yuan Chen, Dynamic trace selection using performance monitoring hardware sampling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
Yuan Chou , Pazhani Pillai , Herman Schmit , John Paul Shen, PipeRench implementation of the instruction path coprocessor, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.147-158, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
|