skip to main content
10.1145/1952682.1952711acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
research-article

Hybrid binary rewriting for memory access instrumentation

Authors Info & Claims
Published:09 March 2011Publication History

ABSTRACT

Memory access instrumentation is fundamental to many applications such as software transactional memory systems, profiling tools and race detectors. We examine the problem of efficiently instrumenting memory accesses in x86 machine code to support software transactional memory and profiling. We aim to automatically instrument all shared memory accesses in critical sections of x86 binaries, while achieving overhead close to that obtained when performing manual instrumentation at the source code level.

The two primary options in building such an instrumentation system are static and dynamic binary rewriting: the former instruments binaries at link time before execution, while the latter binary rewriting instruments binaries at runtime. Static binary rewriting offers extremely low overhead but is hampered by the limits of static analysis. Dynamic binary rewriting is able to use runtime information but typically incurs higher overhead. This paper proposes an alternative: hybrid binary rewriting. Hybrid binary rewriting is built around the idea of a persistent instrumentation cache (PIC) that is associated with a binary and contains instrumented code from it. It supports two execution modes when using instrumentation: active and passive modes. In the active execution mode, a dynamic binary rewriting engine (PIN) is used to intercept execution, and generate instrumentation into the PIC, which is an on-disk file. This execution mode can take full advantage of runtime information. Later, passive execution can be used where instrumented code is executed out of the PIC. This allows us to attain overheads similar to those incurred with static binary rewriting.

This instrumentation methodology enables a variety of static and dynamic techniques to be applied. For example, in passive mode, execution occurs directly from the original executable save for regions that require instrumentation. This has allowed us to build a low-overhead transactional memory profiler. We also demonstrate how we can use the combination of static and dynamic techniques to eliminate instrumentation for accesses to locations that are thread-private.

References

  1. M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 185--196, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bruening, E. Duesterwald, and S. Amarasinghe. Design and implementation of a dynamic optimization framework for windows. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2000.Google ScholarGoogle Scholar
  4. D. Bruening and V. Kiriansky. Process-shared and persistent code caches. In VEE '08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 61--70, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 35--46, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194--208, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Dragojevic, Y. Ni, and A.-R. Adl-Tabatabai. Optimizing transactions for captured memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 214--222, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Eustace and A. Srivastava. Atom: a flexible interface for building high performance program analysis tools. In Proceedings of the USENIX 1995 Technical Conference Proceedings, pages 303--314, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Gajinov, F. Zyulkyarov, O. S. Unsal, A. Cristal, E. Ayguade, T. Harris, and M. Valero. Quake™: parallelizing a complex sequential application using transactional memory. In Proceedings of the 23rd international conference on Supercomputing, pages 126--135, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 89--100, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Olszewski, J. Cutler, and J. G. Steffan. Judostm: A dynamic binary-rewriting approach to software transactional memory. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 365--375, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Payer and T. R. Gross. Generating low-overhead dynamic binary translators. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, pages 22:1--22:14, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 173--184, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. J. Reddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In Proceedings of the International Symposium on Code Generation and Optimization, pages 74--88, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Roy, S. Hand, and T. Harris. Exploring the limits of disjoint access parallelism. In Proceedings of the First USENIX conference on Hot topics in parallelism, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Schwarz, S. Debray, G. Andrews, and M. Legendre. PLTO: A link-time optimizer for the Intel IA-32 architecture. In Proceedings of the 2001 Workshop on Binary Translation, 2001.Google ScholarGoogle Scholar
  18. J. Seward and N. Nethercote. Using valgrind to detect undefined value errors with bit-precision. In Proceedings of the annual conference on USENIX Annual Technical Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Sridhar, J. S. Shapiro, and P. P. Bungale. HDTrans: a low-overhead dynamic translator. SIGARCH Computer Architecture News, 35(1):135--140, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pages 3--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Van Put, D. Chanet, B. De Bus, B. De Sutter, and K. De Bosschere. DIABLO: a reliable, retargetable and extensible link-time rewriting framework. In Proceedings of the 2005 IEEE International Symposium On Signal Processing And Information Technology, pages 7--12, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. C. von Praun, R. Bordawekar, and C. Cascaval. Modeling optimistic concurrency using quantitative dependence analysis. In Proceedings of the 13th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 185--196, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Welc, B. Saha, and A.-R. Adl-Tabatabai. Irrevocable transactions and their applications. In Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, pages 285--296, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hybrid binary rewriting for memory access instrumentation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
      March 2011
      250 pages
      ISBN:9781450306874
      DOI:10.1145/1952682
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 7
        VEE '11
        July 2011
        231 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2007477
        Issue’s Table of Contents

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate80of235submissions,34%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader