ABSTRACT
Memory access instrumentation is fundamental to many applications such as software transactional memory systems, profiling tools and race detectors. We examine the problem of efficiently instrumenting memory accesses in x86 machine code to support software transactional memory and profiling. We aim to automatically instrument all shared memory accesses in critical sections of x86 binaries, while achieving overhead close to that obtained when performing manual instrumentation at the source code level.
The two primary options in building such an instrumentation system are static and dynamic binary rewriting: the former instruments binaries at link time before execution, while the latter binary rewriting instruments binaries at runtime. Static binary rewriting offers extremely low overhead but is hampered by the limits of static analysis. Dynamic binary rewriting is able to use runtime information but typically incurs higher overhead. This paper proposes an alternative: hybrid binary rewriting. Hybrid binary rewriting is built around the idea of a persistent instrumentation cache (PIC) that is associated with a binary and contains instrumented code from it. It supports two execution modes when using instrumentation: active and passive modes. In the active execution mode, a dynamic binary rewriting engine (PIN) is used to intercept execution, and generate instrumentation into the PIC, which is an on-disk file. This execution mode can take full advantage of runtime information. Later, passive execution can be used where instrumented code is executed out of the PIC. This allows us to attain overheads similar to those incurred with static binary rewriting.
This instrumentation methodology enables a variety of static and dynamic techniques to be applied. For example, in passive mode, execution occurs directly from the original executable save for regions that require instrumentation. This has allowed us to build a low-overhead transactional memory profiler. We also demonstrate how we can use the combination of static and dynamic techniques to eliminate instrumentation for accesses to locations that are thread-private.
- M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 185--196, 2009. Google ScholarDigital Library
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, 2000. Google ScholarDigital Library
- D. Bruening, E. Duesterwald, and S. Amarasinghe. Design and implementation of a dynamic optimization framework for windows. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2000.Google Scholar
- D. Bruening and V. Kiriansky. Process-shared and persistent code caches. In VEE '08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 61--70, 2008. Google ScholarDigital Library
- C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 35--46, 2008.Google ScholarCross Ref
- D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194--208, 2006. Google ScholarDigital Library
- A. Dragojevic, Y. Ni, and A.-R. Adl-Tabatabai. Optimizing transactions for captured memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 214--222, 2009. Google ScholarDigital Library
- A. Eustace and A. Srivastava. Atom: a flexible interface for building high performance program analysis tools. In Proceedings of the USENIX 1995 Technical Conference Proceedings, pages 303--314, 1995. Google ScholarDigital Library
- V. Gajinov, F. Zyulkyarov, O. S. Unsal, A. Cristal, E. Ayguade, T. Harris, and M. Valero. Quake™: parallelizing a complex sequential application using transactional memory. In Proceedings of the 23rd international conference on Supercomputing, pages 126--135, 2009. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, 2005. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 89--100, 2007. Google ScholarDigital Library
- M. Olszewski, J. Cutler, and J. G. Steffan. Judostm: A dynamic binary-rewriting approach to software transactional memory. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 365--375, 2007. Google ScholarDigital Library
- M. Payer and T. R. Gross. Generating low-overhead dynamic binary translators. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, pages 22:1--22:14, 2010. Google ScholarDigital Library
- P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 173--184, 2009. Google ScholarDigital Library
- V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Proceedings of the International Symposium on Code Generation and Optimization, pages 74--88, 2007. Google ScholarDigital Library
- A. Roy, S. Hand, and T. Harris. Exploring the limits of disjoint access parallelism. In Proceedings of the First USENIX conference on Hot topics in parallelism, 2009. Google ScholarDigital Library
- B. Schwarz, S. Debray, G. Andrews, and M. Legendre. PLTO: A link-time optimizer for the Intel IA-32 architecture. In Proceedings of the 2001 Workshop on Binary Translation, 2001.Google Scholar
- J. Seward and N. Nethercote. Using valgrind to detect undefined value errors with bit-precision. In Proceedings of the annual conference on USENIX Annual Technical Conference, 2005. Google ScholarDigital Library
- S. Sridhar, J. S. Shapiro, and P. P. Bungale. HDTrans: a low-overhead dynamic translator. SIGARCH Computer Architecture News, 35(1):135--140, 2007. Google ScholarDigital Library
- T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 3--14, 2009. Google ScholarDigital Library
- L. Van Put, D. Chanet, B. De Bus, B. De Sutter, and K. De Bosschere. DIABLO: a reliable, retargetable and extensible link-time rewriting framework. In Proceedings of the 2005 IEEE International Symposium On Signal Processing And Information Technology, pages 7--12, 2005.Google ScholarCross Ref
- C. von Praun, R. Bordawekar, and C. Cascaval. Modeling optimistic concurrency using quantitative dependence analysis. In Proceedings of the 13th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 185--196, 2008. Google ScholarDigital Library
- A. Welc, B. Saha, and A.-R. Adl-Tabatabai. Irrevocable transactions and their applications. In Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, pages 285--296, 2008. Google ScholarDigital Library
Index Terms
- Hybrid binary rewriting for memory access instrumentation
Recommendations
Hybrid binary rewriting for memory access instrumentation
VEE '11Memory access instrumentation is fundamental to many applications such as software transactional memory systems, profiling tools and race detectors. We examine the problem of efficiently instrumenting memory accesses in x86 machine code to support ...
Anywhere, any-time binary instrumentation
PASTE '11: Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software toolsThe Dyninst binary instrumentation and analysis framework distinguishes itself from other binary instrumentation tools through its abstract, machine independent interface; its emphasis on anywhere, any-time binary instrumentation; and its low overhead ...
Efficient, sensitivity resistant binary instrumentation
ISSTA '11: Proceedings of the 2011 International Symposium on Software Testing and AnalysisBinary instrumentation allows users to inject new code into programs without requiring source code, symbols, or debugging information. Instrumenting a binary requires structural modifications such as moving code, adding new code, and overwriting ...
Comments