skip to main content
10.1145/1023833.1023870acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Procedure placement using temporal-ordering information: dealing with code size expansion

Published: 22 September 2004 Publication History

Abstract

In a direct-mapped instruction cache, all instructions that have the same memory address modulo the cache size, share a common and unique cache slot. Instruction cache conflicts can be partially handled at linked time by procedure placement. Pettis and Hansen give in [1] an algorithm that reorders procedures in memory by aggregating them in a greedy fashion. The Gloy and Smith algorithm [2] greatly decreases the number of con ict-misses but increases the code size by allowing gaps between procedures. The latter contains two main stages: the cache-placement phase assigns modulo addresses to minimizes cache-conflicts; the memory-placement phase assigns final memory addresses under the modulo placement constraints, and minimizes the code size expansion. In this paper: (1) we state the NP-completeness of the cache-placement problem; (2) we provide an optimal algorithm to the memory-placement problem with complexity O(n min(n; L) log* (n)) (n is the number of procedures, L the cache size); (3) we take final program size into consideration during the cache-placement phase. Our modifications to the Gloy and Smith algorithm gives on average a code size expansion of 8% over the original program size, while the initial algorithm gave an expansion of 177%. The cache miss reduction is nearly the same as the Gloy and Smith solution with 35% cache miss reduction.

References

[1]
K. Pettis and R. C. Hansen. Profile guided code positioning. ACM SIGPLAN Notices, 25(6):16--27, June 1990.
[2]
N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems (TOPLAS), 21(5):977--1027, 1999.
[3]
P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood. Lx: a technology platform for customizable VLIW embedded processing. In The 27th Annual International Symposium on Computer architecture 2000, pages 203--213, New York, NY, USA, 2000. ACM Press.
[4]
A. H. Hashemi, D. R. Kaeli, and B. Calder. Efficient procedure mapping using cache line coloring. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 171--182, 1997.
[5]
J. Kalamatianos and D. R. Kaeli. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 244--253, Las Vegas, Nevada, January 31-February 4, 1998. IEEE Computer Society TCCA.
[6]
T. Bidault, C. Guillon, F. Bouchez, and F. Rastello. Procedure placement using temporal-ordering information: dealing with code size expansion. Technical Report RR-04-16, LIP, ENS Lyon, France, april 2004.
[7]
T. H. Cormen, C. E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, 1990.
[8]
S. R. Buss and P. N. Yianilos. Linear and O( n log n ) time minimum-cost matching algorithms for quasi-convex tours. SIAM Journal on Computing, 27(1):170--201, February 1998.
[9]
Open64 Compiler Tools. http://open64.sourceforge.net.

Cited By

View all
  • (2013)An analytical approach for fast and accurate design space exploration of instruction cachesACM Transactions on Embedded Computing Systems10.1145/2539036.253903913:3(1-29)Online publication date: 24-Dec-2013
  • (2012)An automatic code overlaying technique for multicores with explicitly-managed memory hierarchiesProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259045(219-229)Online publication date: 31-Mar-2012
  • (2012)Automatic code overlay generation and partially redundant code fetch eliminationACM Transactions on Architecture and Code Optimization (TACO)10.1145/2207222.22072269:2(1-32)Online publication date: 15-Jun-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
September 2004
324 pages
ISBN:1581138903
DOI:10.1145/1023833
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hamiltonian-path
  2. cache miss
  3. code placement
  4. code size
  5. instruction cache
  6. min-matching
  7. profiling

Qualifiers

  • Article

Conference

CASES04

Acceptance Rates

Overall Acceptance Rate 52 of 230 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2013)An analytical approach for fast and accurate design space exploration of instruction cachesACM Transactions on Embedded Computing Systems10.1145/2539036.253903913:3(1-29)Online publication date: 24-Dec-2013
  • (2012)An automatic code overlaying technique for multicores with explicitly-managed memory hierarchiesProceedings of the Tenth International Symposium on Code Generation and Optimization10.1145/2259016.2259045(219-229)Online publication date: 31-Mar-2012
  • (2012)Automatic code overlay generation and partially redundant code fetch eliminationACM Transactions on Architecture and Code Optimization (TACO)10.1145/2207222.22072269:2(1-32)Online publication date: 15-Jun-2012
  • (2011)WCET-driven cache-aware code positioningProceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems10.1145/2038698.2038722(145-154)Online publication date: 9-Oct-2011
  • (2010)Improved procedure placement for set associative cachesProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878944(147-156)Online publication date: 24-Oct-2010
  • (2010)Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoCProceedings of the 2010 39th International Conference on Parallel Processing10.1109/ICPP.2010.65(573-582)Online publication date: 13-Sep-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media