ABSTRACT
Dynamic Binary Translators and Optimizers (DBTOs) have been established as a common architecture during the last years. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed microarchitectures. Although many researchers worked on characterizing and reducing the emulation overhead, there are no published results that explain how the DBTO behaves from the microarchitectural prospective and how its behavior may be predicted based on high-level, guest application statistics. Such results are important for guiding design decisions and system optimization.
In this paper we study the DBTO as an independent application by dividing its functionality into modules. We show that the behavior of the DBTO is not constant at all. The contribution of the different modules in the total overhead, the overhead itself, the microarchitectural interaction with the emulated application and the microarchitectural profile of the different modules changes significantly based on the emulated application. This result comes in contrast to numerous papers that consider this behavior constant and exclude the DBTO from the simulation. Throughout this paper we detail this variance, we quantify it and we explain the reasons behind it.
The insights presented in this work can be exploited towards the design of more efficient DBTOs and their early performance evaluation.
- MediaBench II Benchmark (http://euler.slu.edu/fritts/mediabench/).Google Scholar
- Quick EMUlation tool (http://www.qemu.org/).Google Scholar
- Rosetta (http://www.apple.com/asia/rosetta/).Google Scholar
- Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmarks. (http://www.spec.org/cpu2006/).Google Scholar
- A. Aho and J. Ullman. Principles of Compiler Design. Addison-Wesley, 1977. Google ScholarDigital Library
- E. Altman et al. BOA: The Architecture of a Binary Translation Processor. IBM Research Report RC, 2000.Google Scholar
- M. Annavaram et al. The Fuzzy Correlation between Code and Performance Predictability. In Proceedings of the International Symposium on Microarchitecture (MICRO), 2004. Google ScholarDigital Library
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation (PLDI), 2000. Google ScholarDigital Library
- S. M. Blackburn et al. Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century. Commun. ACM, 51(8):83--89, August 2008. Google ScholarDigital Library
- E. Borin and Y. Wu. Characterization of DBT overhead. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), 2009. Google ScholarDigital Library
- L. Breiman, J. H. Freidman, R. O. Olshen, and C. J. Stone. Classification and Regression Trees. Kluwer Publishers, 1984.Google Scholar
- D. Bruening, T. Garnett, and S. Amarasinghe. An Infrastructure for Adaptive Dynamic Optimizations. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2003. Google ScholarDigital Library
- A. Chernoff and R. Hookway. DIGITAL FX!32 running 32-bit applications on alpha NT. In Proceedings of the USENIX Windows NT Workshop, 1997. Google ScholarDigital Library
- D. Daly and H. W. Cain. Cache Restoration for Highly Partitioned Virtualized Systems. In Proceedings of High Performance Computer Architecture (HPCA), 2012. Google ScholarDigital Library
- J. C. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges. In Proceedings of the International Symposium on Code generation and optimization (CGO), 2003. Google ScholarDigital Library
- K. Ebcioglu and E. R. Altman. DAISY: Dynamic Compilation for 100% Architectural Compatibility. SIGARCH Comput. Archit. News, 2005. Google ScholarDigital Library
- A. Georges, L. Eeckhout, and D. Buytaert. Java Performance Evaluation through Rigorous Replay Compilation. SIGPLAN Not., October 2008. Google ScholarDigital Library
- A. Guha, K. Hazelwood, and M. L. Soffa. Balancing Memory and Performance through Selective Flushing of Software Code Caches. In Proceedings of the International Symposium on Compilers Architectures and Synthesis for Embedded Systems (CASES), 2010. Google ScholarDigital Library
- J. D. Hiser and D. Williams et al. Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2007. Google ScholarDigital Library
- A. Klaiber. The Technology Behind the Crusoe Processors. White paper, January 2000.Google Scholar
- K. Krewell. Transmeta Gets More Efficeon. Micro-processor Report, 2003.Google Scholar
- T. Lindholm and F. Yellin. Java Virtual Machine Specification. Addison-Wesley, 1999. Google ScholarDigital Library
- C. Luk et al. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the Programming language design and implementation (PLDI), 2005. Google ScholarDigital Library
- S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997. Google ScholarDigital Library
- IBM Microelectronics Division Research Triangle Park NC. The PowerPC 440 Core. White Paper, 1999.Google Scholar
- N. Nethercote. Dynamic Binary Analysis and Instrumentation. PhD Dissertation, 1999.Google Scholar
- D. Pavlou, A. Brankovic, R. Kumar, M. Gregori, K. Stavrou, E. Gibert, and A. Gonzalez. DARCO: Infrastructure for Research on HW/SW co-designed Virtual Machines. In Proceedings of the 4th Workshop on Architectural and Microarchitectural Support for Binary Translation, held in conjuction with ISCA-38 (AMAS-BT), 2011.Google Scholar
- D. Pavlou, A. Brankovic, R. Kumar, K. Stavrou, E. Gibert, and A. Gonzalez. Quantitative Characterization of the Software Layer of a HW/SW Co-Designed Processor. Technical report, 2012.Google Scholar
- D. Pavlou, E. Gibert, F. Latorre, and A. Gonzalez. DDGacc: Boosting Dynamic DDG-based Binary Optimizations through Specialized Hardware Support. In Proceedings of the ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments (VEE), 2012. Google ScholarDigital Library
- K. Scott, N. Kumar, S. Velusamy, B. Childers, J. Davison, and M. Soffa. Retargetable and Reconfigurable Software Dynamic Translation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), 2003. Google ScholarDigital Library
- S. Sridhar, J. S. Shapiro, and P. P. Bungale. HDTrans: A Low-Overhead Dynamic Translator. SIGARCH Comput. Archit. News, March 2007. Google ScholarDigital Library
- P. F. Sweeney et al. Using hardware performance monitors to understand the behavior of Java applications. In Proceedings of the conference on Virtual Machine Research And Technology Symposium - Volume 3, 2004. Google ScholarDigital Library
- G. R. Uh, R. Cohn, B. Yadavalli, R. Peri, and R. Ayyagari. Analyzing Dynamic Binary Instrumentation Overhead. In Proceedings of the Workshop on Binary Instrumentation and Application, 2006.Google Scholar
- T. Y. Yeh, P. Faloutsos, S. J. Patel, and G. Reinmann. ParallAX: An Architecture for Real-Time Physics. In Proceedings of the International Symposium on Computer Architecture (ISCA), 2007. Google ScholarDigital Library
Index Terms
- Performance analysis and predictability of the software layer in dynamic binary translators/optimizers
Recommendations
Optimizing Indirect Branches in Dynamic Binary Translators
Dynamic binary translation is a technology for transparently translating and modifying a program at the machine code level as it is running. A significant factor in the performance of a dynamic binary translator is its handling of indirect branches. ...
Exploiting SIMD capability in an ARMv7-to-ARMv8 dynamic binary translator
CASES '18: Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded SystemsARMv8 based processors are now prevalent in mobile devices while the majority of applications in Google App Store are still ARMv7 code. This causes the desire for supporting backward compatibility. Such circumstances not only complicate the hardware ...
Low overhead dynamic binary translation on ARM
PLDI '17The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
Comments