skip to main content
article
Open access

Adapting branch-target buffer to improve the target predictability of java code

Published: 01 June 2005 Publication History

Abstract

Java programs are increasing in popularity and prevalence on numerous platforms, including high-performance general-purpose processors. The success of Java technology largely depends on the efficiency in executing the portable Java bytecodes. However, the dynamic characteristics of the Java runtime system present unique performance challenges for several aspects of microarchitecture design. In this work, we focus on the effects of indirect branches on branch-target address prediction performance. Runtime bytecode translation, just-in-time (JIT) compilation, frequent calls to the native interface libraries, and dependence on virtual methods increase the frequency of polymorphic indirect branches. Therefore, accurate target address prediction for indirect branches is very important for Java code.This paper characterizes the indirect branch behavior in Java processing and proposes an adaptive branch-target buffer (BTB) design to enhance the predictability of the targets. Our characterization shows that a traditional BTB will frequently mispredict a few polymorphic indirect branches, significantly deteriorating predictor accuracy in Java processing. Therefore, we propose a rehashable branch-target buffer (R-BTB), which dynamically identifies polymorphic indirect branches and adapts branch-target storage to accommodate multiple targets for a branch.The R-BTB improves the target predictability of indirect branches without sacrificing overall target prediction accuracy. Simulations show that the R-BTB eliminates 61% of the indirect branch mispredictions suffered with a traditional BTB for Java programs running in interpreter mode (46% in JIT mode), which leads to a 57% decrease in overall target address misprediction rate (29% in JIT mode). With an equivalent number of entries, the R-BTB also outperforms the previously proposed target cache scheme for a majority of Java programs by adapting to a greater variety of indirect branch behaviors.

Supplementary Material

ZIP File (p109-li.zip)
This zip file contains three structured document full-text XML representations of the article plus the final, self-contained source LaTeX file, which can be compiled on most LaTeX systems. The three XML files all present images as gifs. They differ in their representations of math: one has gifs, one has embedded LaTeX, and one has MathML. Save the zip to disk and expand into its four folders.

References

[1]
Aigner, G. and Hölzle, U. 1996. Eliminating virtual calls in C++ programs. In Proceedings of 10th European Conference on Object Oriented Programming. 142--166.]]
[2]
Calder, B., Grunwald, D., and Zorn, B. 1994a. Quantifying behavioral differences between C and C++ programs. J. Programming Languages. 2, 4, 323--351.]]
[3]
Calder, B. and Grunwald, D. 1994b. Reducing indirect function call overhead in C++ programs, In Proceedings of the International Symposium on Principles of Programming Languages. 397--408.]]
[4]
Chang, P. Y., Hao, E., and Patt, Y. N. 1997. Target prediction for indirect jumps. In Proceedings of the 24th International Symposium on Computer Architecture. 274--283.]]
[5]
Driesen, K. and Hölzle, U. 1998a. Accurate indirect branch prediction. In Proceedings of the 25th Annual International Symposium on Computer Architecture. 167--178.]]
[6]
Driesen, K. and Hölzle, U. 1998b. The cascaded predictor: Economical and adaptive branch target prediction. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. 249--258.]]
[7]
Dean, J., Defouw, G., Grove, D., Litvinov, V., and Chambers, C. 1996. Vortex: An optimizing compiler for object-oriented languages. In Proceedings of ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications. 83--100.]]
[8]
Ertl, A. M. and Gregg, D. 2001. The behavior of efficient virtual machine interpreters on modern architectures. In Proceedings of Euro-Par 2001. 403--412.]]
[9]
Ertl, A. M. and Gregg, D. 2003. Optimizing indirect branch prediction accuracy in virtual machine interpreters. In Proceedings of SIGPLAN'03 Conference on Programming Language Design and Implementation. 278--288.]]
[10]
Hsieh, C. H. A., Conte, M. T., Johnson, T. L., Gyllenhaal, J. C., and Hwu, W. W. 1997. A study of the cache and branch performance issues with running java on current hardware platforms. In Proceedings of COMPCON. 211--216.]]
[11]
Kalamatianos, J. and Kaeil, D. R. A. 2000. Predicting indirect branches via data compression. In Proceedings of the International Symposium on Microarchitecture, 1998.]]
[12]
Lee, J. and Smith, A. 1984. Branch prediction strategies and branch-target buffer design. IEEE Computer 17, 1, 1984.]]
[13]
Li, T., John, L. K., Vijaykrishnan, N., Sivasubramaniam A., Sabarinathan, J., and Murthy, A. 2000. Using complete system simulation to characterize SPECjvm98 benchmarks. In Proceedings of ACM International Conference on Supercomputing. 22--33.]]
[14]
Li, T. and John, L. K. 2001. Understanding control-flow transfer and its predictability in java processing. In Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software. 65--76.]]
[15]
Li, T., Bhargava R., and John, L. K. 2002. Rehashable BTB: An adaptive branch-target buffer to improve the target predictability of java code. In Proceedings of the 9th International High Performance Computing Conference. 597--608.]]
[16]
Lindholm, T. and Yellin, F. 1999. The Java Virtual Machine Specification. 2nd Ed. Addison Wesley, Reading, MA.]]
[17]
Radhakrishnan, R., Vijaykrishnan, N., John, L. K., and Sivasubramaniam, A. 2000. Architectural issue in java runtime systems. In Proceedings of the 6th International Conference on High Performance Computer Architecture. 387--398.]]
[18]
Reinman, G. and Jouppi, N. P. 2000. CACTI2.0: An integrated cache timing and power model. WRL Research Report, July, 2000.]]
[19]
Rosenblum, M., Herrod, S. A., Witchel, E., and Gupta, A. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel Distributed Technology: Systems Applications, 3, 4, 34--43.]]
[20]
SPEC JVM98 Benchmarks, http://www.spec.org/osg/jvm98/.]]
[21]
Vlaovic, S., Davidson E. S., and Atyson, G. S. 2000. Improving BTB performance in the presence of DLLs. In Proceedings of the International Symposium on Microarchitecture. 77--86.]]
[22]
Vijaykrishnan, N. and Ranganathan, N. 1999. Tuning branch predictors to support virtual method invocation in java. In Proceedings of the 5th USENIX Conference of Object-Oriented Technologies and Systems. 217--228.]]
[23]
Yeh, T. and Patt, Y. 1995. A comparison of dynamic branch predictors that use two levels of branch history. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 257--266.]]

Cited By

View all
  • (2023)BiRFIA: Selective Binary Rewriting for Function Interception on ARMProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593701(87-98)Online publication date: 21-Jun-2023
  • (2022)SaBRe: load-time selective binary rewritingInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-021-00644-w24:2(205-223)Online publication date: 21-Jan-2022
  • (2021)PDede: Partitioned, Deduplicated, Delta Branch Target BufferMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480046(779-791)Online publication date: 18-Oct-2021
  • Show More Cited By

Recommendations

Reviews

Herbert G. Mayer

A new architecture that will improve the precision of indirect branch (IB) prediction in Java code is proposed in this paper. Two percent of average object code for C++ contains IBs, while object code for Java and interpreted Java has ten times as many IBs. A large percentage of these Java IBs is polymorphic (namely, has varying target addresses at runtime). Unfortunately, branch prediction with conventional branch target buffer (BTB) methods works poorly for IBs, thus impeding execution speed. The new method uses a rehashable BTB (R-BTB), resulting in highly improved polymorphic branch (PB) prediction. Raising the general target accuracy with moderate hardware cost is achieved by focusing dynamically on a small, critical subset of IBs, and by providing added resources in the R-BTB to store numerous target addresses. The authors use a simulator, allowing full instrumentation of the tested runtime environment, and quantification of a detailed cost-benefit analysis. The paper shows the resulting architectural solution with speedup numbers. After an introduction in section 1, section 2 outlines the simulation framework SimOS, running on IRIX 5.3, and the benchmark used to measure the performance results. Section 3 analyzes the number of IBs in Java code, both for PBs and monomorphic IBs. Details of the rehashable BTB are explained in section 4, while section 5 weighs architecture design trade-offs. Section 6 evaluates the performance of the R-BTB, by comparing its misprediction rate with that of traditional BTBs. Section 7 is a superb survey of related work, and section 8 concludes the paper. Though section 5 is too terse in explaining the various architectural trade-offs of the new R-BTB design, the overall paper is clear and well written, and the subject matter is important. The paper articulates the incremental value of the new branch prediction method. With the growing acceptance of just in time (JIT) compiled mode and interpreted mode in Java, the new ideas offer significant value. R-BTBs can exploit the amazingly great locality of runtime addresses even for polymorphic IBs. Every architect working on a new generation of processors will have to comprehend the key message in this paper, and decide whether or not to implement R-BTB. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 2, Issue 2
June 2005
111 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1071604
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2005
Published in TACO Volume 2, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Computer architecture
  2. Java
  3. branch prediction
  4. branch-target buffer (BTB)
  5. pipelined processor

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)135
  • Downloads (Last 6 weeks)23
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)BiRFIA: Selective Binary Rewriting for Function Interception on ARMProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593701(87-98)Online publication date: 21-Jun-2023
  • (2022)SaBRe: load-time selective binary rewritingInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-021-00644-w24:2(205-223)Online publication date: 21-Jan-2022
  • (2021)PDede: Partitioned, Deduplicated, Delta Branch Target BufferMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480046(779-791)Online publication date: 18-Oct-2021
  • (2019)JumpswitchesProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358832(285-299)Online publication date: 10-Jul-2019
  • (2014)Leveraging dynamic slicing to enhance indirect branch prediction2014 IEEE 32nd International Conference on Computer Design (ICCD)10.1109/ICCD.2014.6974696(292-299)Online publication date: Oct-2014
  • (2013)A novel architecture for ahead branch predictionFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-013-2260-x7:6(914-923)Online publication date: 1-Dec-2013
  • (2012)CVPProceedings of the 26th ACM international conference on Supercomputing10.1145/2304576.2304593(111-120)Online publication date: 25-Jun-2012
  • (2012)Compiler techniques to improve dynamic branch prediction for indirect jump and call instructionsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867038:4(1-20)Online publication date: 26-Jan-2012
  • (2010)Low-cost class caching mechanism for Java SoCProceedings of 2010 IEEE International Symposium on Circuits and Systems10.1109/ISCAS.2010.5537744(3753-3756)Online publication date: May-2010
  • (2009)Phantom-BTBACM SIGARCH Computer Architecture News10.1145/2528521.150828137:1(313-324)Online publication date: 7-Mar-2009
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media