ACM Home Page
Please provide us with feedback. Feedback
A hardware mechanism for dynamic extraction and relayout of program hot spots
Full text PdfPdf (320 KB)
Source International Symposium on Computer Architecture archive
Proceedings of the 27th annual international symposium on Computer architecture table of contents
Vancouver, British Columbia, Canada
Pages: 59 - 70  
Year of Publication: 2000
ISBN:1-58113-232-8
Also published in ...
Authors
Matthew C. Merten  Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
Andrew R. Trick  Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
Erik M. Nystrom  Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
Ronald D. Barnes  Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
Wen-mei W. Hmu  Coordinated Science Lab, 1308 West Main Street, MC-228 Urbana, IL
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 18,   Citation Count: 10
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/339647.339655
What is a DOI?

ABSTRACT

This paper presents a new mechanism for collecting and deploying runtime optimized code. The code-collecting component resides in the instruction retirement stage and lays out hot execution paths to improve instruction fetch rate as well as enable further code optimization. The code deployment component uses an extension to the Branch Target Buffer to migrate execution into the new code without modifying the original code. No significant delay is added to the total execution of the program due to these components. The code collection scheme enables safe runtime optimization along paths that span function boundaries. This technique provides a better platform for runtime optimization than trace caches, because the traces are longer and persist in main memory across context switches. Additionally, these traces are not as susceptible to transient behavior because they are restricted to frequently executed code. Empirical results show that on average this mechanism can achieve better instruction fetch rates using only 12KB of hardware than a trace cache requiring 15KB of hardware, while producing long, persistent traces more suited to optimization.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
V. Bala, E. Duesterwald, and S. Banerjia. Transparent dynamic optimization: The design and implementation of dynamo. Technical Report HPL-1999-78, Hewlett-Packard Laboratories Cambridge, June 1999.
2
3
 
4
 
5
 
6
7
 
8
9
10
 
11

CITED BY  10
 
 
 
 
 
 

Collaborative Colleagues:
Matthew C. Merten: colleagues
Andrew R. Trick: colleagues
Erik M. Nystrom: colleagues
Ronald D. Barnes: colleagues
Wen-mei W. Hmu: colleagues

Peer to Peer - Readers of this Article have also read: