Article

Procedure placement using temporal-ordering information: dealing with code size expansion

Authors:

Christophe Guillon,

Fabrice Rastello,

Thierry Bidault,

Florent BouchezAuthors Info & Claims

CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems

Pages 268 - 279

https://doi.org/10.1145/1023833.1023870

Published: 22 September 2004 Publication History

Get Access

Abstract

In a direct-mapped instruction cache, all instructions that have the same memory address modulo the cache size, share a common and unique cache slot. Instruction cache conflicts can be partially handled at linked time by procedure placement. Pettis and Hansen give in [1] an algorithm that reorders procedures in memory by aggregating them in a greedy fashion. The Gloy and Smith algorithm [2] greatly decreases the number of con ict-misses but increases the code size by allowing gaps between procedures. The latter contains two main stages: the cache-placement phase assigns modulo addresses to minimizes cache-conflicts; the memory-placement phase assigns final memory addresses under the modulo placement constraints, and minimizes the code size expansion. In this paper: (1) we state the NP-completeness of the cache-placement problem; (2) we provide an optimal algorithm to the memory-placement problem with complexity O(n min(n; L) log* (n)) (n is the number of procedures, L the cache size); (3) we take final program size into consideration during the cache-placement phase. Our modifications to the Gloy and Smith algorithm gives on average a code size expansion of 8% over the original program size, while the initial algorithm gave an expansion of 177%. The cache miss reduction is nearly the same as the Gloy and Smith solution with 35% cache miss reduction.

References

[1]

K. Pettis and R. C. Hansen. Profile guided code positioning. ACM SIGPLAN Notices, 25(6):16--27, June 1990.

Digital Library

Google Scholar

[2]

N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems (TOPLAS), 21(5):977--1027, 1999.

Digital Library

Google Scholar

[3]

P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood. Lx: a technology platform for customizable VLIW embedded processing. In The 27th Annual International Symposium on Computer architecture 2000, pages 203--213, New York, NY, USA, 2000. ACM Press.

Digital Library

Google Scholar

[4]

A. H. Hashemi, D. R. Kaeli, and B. Calder. Efficient procedure mapping using cache line coloring. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 171--182, 1997.

Digital Library

Google Scholar

[5]

J. Kalamatianos and D. R. Kaeli. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 244--253, Las Vegas, Nevada, January 31-February 4, 1998. IEEE Computer Society TCCA.

Digital Library

Google Scholar

[6]

T. Bidault, C. Guillon, F. Bouchez, and F. Rastello. Procedure placement using temporal-ordering information: dealing with code size expansion. Technical Report RR-04-16, LIP, ENS Lyon, France, april 2004.

Google Scholar

[7]

T. H. Cormen, C. E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, 1990.

Digital Library

Google Scholar

[8]

S. R. Buss and P. N. Yianilos. Linear and O( n log n ) time minimum-cost matching algorithms for quasi-convex tours. SIAM Journal on Computing, 27(1):170--201, February 1998.

Digital Library

Google Scholar

[9]

Open64 Compiler Tools. http://open64.sourceforge.net.

Google Scholar

Cited By

View all

Liang YMitra T(2013)An analytical approach for fast and accurate design space exploration of instruction cachesACM Transactions on Embedded Computing Systems10.1145/2539036.253903913:3(1-29)Online publication date: 24-Dec-2013
https://dl.acm.org/doi/10.1145/2539036.2539039
Jang CLee JSeo SLee JEidt CHoller ASrinivasan UAmarasinghe S(2012)An automatic code overlaying technique for multicores with explicitly-managed memory hierarchiesProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259045(219-229)Online publication date: 31-Mar-2012
https://dl.acm.org/doi/10.1145/2259016.2259045
Jang CLee JEgger BRyu S(2012)Automatic code overlay generation and partially redundant code fetch eliminationACM Transactions on Architecture and Code Optimization (TACO)10.1145/2207222.22072269:2(1-32)Online publication date: 15-Jun-2012
https://dl.acm.org/doi/10.1145/2207222.2207226
Show More Cited By

Index Terms

Procedure placement using temporal-ordering information: dealing with code size expansion

Recommendations

Procedure placement using temporal-ordering information: Dealing with code size expansion
Cache exploitation in embedded systems

In a direct-mapped instruction cache, all instructions that have the same memory address modulo the cache size share a common and unique cache slot. Instruction cache conflicts can be partially handled at linked time by procedure placement. Pettis and ...
Improved procedure placement for set associative caches
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

The performance of most embedded systems is critically dependent on the memory hierarchy performance. In particular, higher cache hit rate can provide significant performance boost to an embedded application. Procedure placement is a popular technique ...
Procedure placement using temporal-ordering information

Instruction cache performance is important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate and the instruction working set size during execution. This means ...

Comments

Information & Contributors

Information

Published In

CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems

September 2004

324 pages

ISBN:1581138903

DOI:10.1145/1023833

General Chairs:
Mary Jane Irwin
Pennsylvania State University
,
Wei Zhao
Texas Instruments
,
Program Chairs:
Luciano Lavagno
Politecnico di Torino/Cadence Labs
,
Scott Mahlke
University of Michigan, Ann Arbor, MI

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CASES04

Sponsor:

CASES04: 2004 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

September 22 - 25, 2004

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
321
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liang YMitra T(2013)An analytical approach for fast and accurate design space exploration of instruction cachesACM Transactions on Embedded Computing Systems10.1145/2539036.253903913:3(1-29)Online publication date: 24-Dec-2013
https://dl.acm.org/doi/10.1145/2539036.2539039
Jang CLee JSeo SLee JEidt CHoller ASrinivasan UAmarasinghe S(2012)An automatic code overlaying technique for multicores with explicitly-managed memory hierarchiesProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259045(219-229)Online publication date: 31-Mar-2012
https://dl.acm.org/doi/10.1145/2259016.2259045
Jang CLee JEgger BRyu S(2012)Automatic code overlay generation and partially redundant code fetch eliminationACM Transactions on Architecture and Code Optimization (TACO)10.1145/2207222.22072269:2(1-32)Online publication date: 15-Jun-2012
https://dl.acm.org/doi/10.1145/2207222.2207226
Falk HKotthaus HGupta RMooney V(2011)WCET-driven cache-aware code positioningProceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems10.1145/2038698.2038722(145-154)Online publication date: 9-Oct-2011
https://dl.acm.org/doi/10.1145/2038698.2038722
Liang YMitra TKathail VTatge RBarua R(2010)Improved procedure placement for set associative cachesProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878944(147-156)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878944
Liu TZhao YLi MXue C(2010)Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoCProceedings of the 2010 39th International Conference on Parallel Processing10.1109/ICPP.2010.65(573-582)Online publication date: 13-Sep-2010
https://dl.acm.org/doi/10.1109/ICPP.2010.65

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations