skip to main content
article

Code placement for improving dynamic branch prediction accuracy

Published: 12 June 2005 Publication History

Abstract

Code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches. However, traditional code placement techniques have less benefit in the presence of a trace cache that alters the placement of instructions in the instruction cache. Moreover, as pipelines have become deeper to accommodate increasing clock rates, branch misprediction penalties have become a significant impediment to performance. We evaluate pattern history table partitioning, a feedback directed code placement technique that explicitly places conditional branches so that they are less likely to interfere destructively with one another in branch prediction tables. On SPEC CPU benchmarks running on an Intel Pentium 4, branch mispredictions are reduced by up to 22% and 3.5% on average. This reduction yields a speedup of up to 16.0% and 4.5% on average. By contrast, branch alignment, a previous code placement technique, yields only up to a 4.7% speedup and less than 1% on average.

References

[1]
T. Ball and J. Larus. Branch prediction for free. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 300--313, June 1993.
[2]
D. Burger and T. M. Austin. The SimpleScalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.
[3]
B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19(1):188--222, 1997.
[4]
B. Calder and D. Grunwald. Reducing branch costs via branch alignment. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1994.
[5]
P.-Y. Chang and U. Banerjee. Branch classification: a new mechanism for improving branch predictor performance. In Proceedings of the 27th International Symposium on Microarchitecture, November 1994.
[6]
C.-M. Chen and C.-Ta King. Walk-time address adjustment for improving the accuracy of dynamic branch prediction. IEEE Transactions on Computers, 48(5):457--469, May 1999.
[7]
K. Diefendorff. K7 challenges Intel. Microprocessor Report, 12(14), October 1998.
[8]
D. J. Hatfield and J. Gerald. Program restructuring for virtual memory. IBM Systems Journal, 10(3):168--192, 1971.
[9]
A. N. Eden and T. N. Mudge. The YAGS branch prediction scheme. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, pages 69--80, November 1998.
[10]
M. Evers, P.-Y. Chang, and Y. N. Patt. Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches. In Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.
[11]
N. Gloy and M. D. Smith. Procedure placement using Temporal-Ordering information. ACM Transactions on Programming Languages and Systems, 21(5):977--1027, September 1999.
[12]
B. Hayes. Differences in optimizing for the Pentium 4 processor vs. the Pentium III processor. Intel Developer Services, http://www.intel.com/ cd/ ids/developer/ asmo-na/eng/44010.htm.
[13]
Intel Corporation. Intel Pentium 4 processor optimization. Technical Report Order Number: 248966, Intel Corporation, 2001.
[14]
D. A. Jiménez, S. W. Keckler, and C. Lin. The impact of delay on the design of branch predictors. In Proceedings of the 33rd Annual International Symposium on Microarchitecture (MICRO-33), pages 67--76, December 2000.
[15]
R. E. Kessler. The Alpha 21264 microprocessor. IEEE Micro, 19(2):24--36, March/April 1999.
[16]
S. P. Kim and G. S. Tyson. Analyzing the working set characteristics of branch execution. In Proceedings of the 31rd Annual International Symposium on Microarchitecture, November 1998.
[17]
C.-C. Lee, C. C. Chen, and T. N. Mudge. The bi-mode branch predictor. In Proceedings of the 30th Annual International Symposium on Microarchitecture, pages 4--13, November 1997.
[18]
J. Levon. Oprofile - a system profiler for linux. Technical report, http://oprofile.sourceforge.net/ (Current on September 23, 2004).
[19]
S. McFarling. Program optimization for instruction caches. In Proceedings of the third international conference on Architectural support for programming languages and operating systems, pages 183--191, 1988.
[20]
S. McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191. ACM, 1989.
[21]
S. McFarling. Combining branch predictors. Technical Report TN-36m, Digital Western Research Laboratory, June 1993.
[22]
P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In Proceedings of the 24th International Symposium on Computer Architecture, pages 292--303, June 1997.
[23]
M. Milenkovic, A. Milenkovic, and J. Kulick. Microbenchmarks for determining branch predictor organization. Software Practice and Experience, 34:465--487, April 2004.
[24]
H. Patil and J. Emer. Combining static and dynamic branch prediction to reduce destructive aliasing. In Proceedings of the 6th International Symposium on High Performance Computer Architecture, January 2000.
[25]
K. Pettis and R. C. Hansen. Profile guided code positioning. In Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, pages 16--27, June 1990.
[26]
A. Ramírez, J. Larriba-Pey, C. Navarro, J. Torrellas, and M. Valero. Software trace cache. In Proceedings of the 13th International Conference on Supercomputing, Rhodes, Greece, June 1999.
[27]
A. Ramirez, J. L. Larriba-Pey, and M. Valero. The effect of code reordering on branch prediction. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society Press, October 15--19, 2000.
[28]
E. Rotenberg, S. Bennett, and J. E. Smith. Trace cache: A low latency approach to high bandwidth instruction fetching. In Proceedings of the 29th International Symposium on Microarchitecture, December 1996.
[29]
E. Sprangle, R.S. Chappell, M. Alsup, and Y. N. Patt. The Agree predictor: A mechanism for reducing negative branch history interference. In Proceedings of the 24th International Symposium on Computer Architecture, pages 284--291, June 1997.
[30]
Standard Performance Evaluation Corporation. SPEC CPU 2000, http://www.spec.org/osg/cpu2000, April 2000.
[31]
T.-Y. Yeh and Y. N. Patt. Two-level adaptive branch prediction. In Proceedings of the 24th ACM/IEEE International Symposium on Microarchitecture, pages 51--61, November 1991.
[32]
C. Young, D. S. Johnson, D. R. Karger, and M. D. Smith. Near-optimal intraprocedural branch alignment. In Proceedings of the SIGPLAN'97 Conference on Program Language Design and Implementation, June 1997.
[33]
C. Young and M. D. Smith. Static correlated branch prediction. ACM Transactions on Programming Languages and Systems, May 1999.

Cited By

View all
  • (2023)PopArt: Ranked Testing EfficiencyIEEE Transactions on Software Engineering10.1109/TSE.2022.321479649:4(2221-2238)Online publication date: 1-Apr-2023
  • (2022)Whisper: Profile-Guided Branch Misprediction Elimination for Data Center ApplicationsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00017(19-34)Online publication date: 1-Oct-2022
  • (2024)Research on Link Optimization Technology Based on Basic Block Reordering2024 5th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC)10.1109/ISCEIC63613.2024.10810167(491-495)Online publication date: 8-Nov-2024
  • Show More Cited By

Index Terms

  1. Code placement for improving dynamic branch prediction accuracy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 40, Issue 6
    Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
    June 2005
    325 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1064978
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
      June 2005
      338 pages
      ISBN:1595930566
      DOI:10.1145/1065010
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2005
    Published in SIGPLAN Volume 40, Issue 6

    Check for updates

    Author Tags

    1. branch prediction
    2. compilers

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)PopArt: Ranked Testing EfficiencyIEEE Transactions on Software Engineering10.1109/TSE.2022.321479649:4(2221-2238)Online publication date: 1-Apr-2023
    • (2022)Whisper: Profile-Guided Branch Misprediction Elimination for Data Center ApplicationsProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00017(19-34)Online publication date: 1-Oct-2022
    • (2024)Research on Link Optimization Technology Based on Basic Block Reordering2024 5th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC)10.1109/ISCEIC63613.2024.10810167(491-495)Online publication date: 8-Nov-2024
    • (2017)Optimizing function placement for large-scale data-center applicationsProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049858(233-244)Online publication date: 4-Feb-2017
    • (2017)Optimizing function placement for large-scale data-center applications2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2017.7863743(233-244)Online publication date: Feb-2017
    • (2014)Code Layout Optimization for Defensiveness and Politeness in Shared CacheProceedings of the 2014 Brazilian Conference on Intelligent Systems10.1109/ICPP.2014.24(151-161)Online publication date: 18-Oct-2014
    • (2013)STABILIZERACM SIGPLAN Notices10.1145/2499368.245114148:4(219-228)Online publication date: 16-Mar-2013
    • (2013)STABILIZERACM SIGARCH Computer Architecture News10.1145/2490301.245114141:1(219-228)Online publication date: 16-Mar-2013
    • (2013)A proper performance evaluation system that summarizes code placement effectsProceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering10.1145/2462029.2462035(41-48)Online publication date: 20-Jun-2013
    • (2013)STABILIZERProceedings of the eighteenth international conference on Architectural support for programming languages and operating systems10.1145/2451116.2451141(219-228)Online publication date: 16-Mar-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media