skip to main content
10.1145/1811039.1811057acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Transparent, lightweight application execution replay on commodity multiprocessor operating systems

Published:14 June 2010Publication History

ABSTRACT

We present Scribe, the first system to provide transparent, low-overhead application record-replay and the ability to go live from replayed execution. Scribe introduces new lightweight operating system mechanisms, rendezvous and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Rendezvous points make a partial ordering of execution based on system call dependencies sufficient for replay, avoiding the recording overhead of maintaining an exact execution ordering. Sync points convert asynchronous interactions that can occur at arbitrary times into synchronous events that are much easier to record and replay.

We have implemented Scribe without changing, relinking, or recompiling applications, libraries, or operating system kernels, and without any specialized hardware support such as hardware performance counters. It works on commodity Linux operating systems, and commodity multi-core and multiprocessor hardware. Our results show for the first time that an operating system mechanism can correctly and transparently record and replay multi-process and multi-threaded applications on commodity multiprocessors. Scribe recording overhead is less than 2.5% for server applications including Apache and MySQL, and less than 15% for desktop applications including Firefox, Acrobat, OpenOffice, parallel kernel compilation, and movie playback.

References

  1. D. F. Bacon and S. C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. In Proceedings of the 1991 ACM/ONR Workshop on Parallel and Distributed Debugging, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. M. Balzer. EXDAMS: Extendable Debugging and Monitoring System. In Proceedings of the AFIPS Spring Joint Computer Conference, May 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bergheaud, D. Subhraveti, and M. Vertes. Fault Tolerance in Multiprocessor Systems Via Application Cloning. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS), June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. C. Bressoud. TFT: A Software System for Application-Transparent Fault Tolerance. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. C. Bressoud and F. B. Schneider. Hypervisor-Based Fault Tolerance. In Proceedings of the 15th Symposium on Operating Systems Principles (SOSP), Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. J. Courtois, F. Heymans, and D. L. Parnas. Concurrent Control with "Readers" and "Writers". Communications of the ACM, 14(10), 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual--Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution Replay of Multiprocessor Virtual Machines. In Proceedings of the 4th International Conference on Virtual Execution Environments (VEE), Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the 2006 USENIX Annual Technical Conference, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: An Application-Level Kernel for Record and Replay. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Laadan, R. A. Baratto, D. Phung, S. Potter, and J. Nieh. DejaView: A Personal Virtual Computer Recorder. In Proceedings of the 21st Symposium on Operating Systems Principles (SOSP), Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Laadan and J. Nieh. Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems. In Proceedings of the 2007 USENIX Annual Technical Conference, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Laadan and J. Nieh. Operating System Virtualization: Practice and Experience. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference (SYSTOR), May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. J. Leblanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4), Apr. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. McWhirter, editor. The Guinness Book of World Records. Sterling Publishing Co., Inc, 1985.Google ScholarGoogle Scholar
  19. P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared--Memory Multiprocesso rExecution Efficiently. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: a Software-Hardware Interface for Practical Deterministic Multiprocessor Replay. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Olszweski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Osman, D. Subhraveti, G. Su, and J. Nieh. The Design and Implementation of Zap: A System for Migrating Computing Environments. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Russinovich and B. Cogswell. Replay for Concurrent Non-Deterministic Shared-Memory Applications. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI), May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Saito. Jockey: a User-Space Library for Record-Replay Debugging. In Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. H. Slye and E. Elnozahy. Supporting Nondeterministic Execution in Fault-Tolerant Systems. In Proceedings of the 26th Annual International Symposium on Fault-Tolerant Computing, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In Proceedings of the 2004 USENIX Annual Technical Conference, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Stodden, H. Eichner, M. Walter, and C. Trinitis. Hardware Instruction Counting for Log-based Rollback Recovery on x86-family Processors. In Proceedings of the 3rd International Service Availability Symposium (ISAS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Thane and H. Hansson. Using Deterministic Replay for Debugging of Distributed Real-Time Systems. In Proceedings of the 12th Euromicro Conference on Real-Time System, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Tucker. Personal communications, June 2009.Google ScholarGoogle Scholar
  32. Vmware. http://www.vmware.com.Google ScholarGoogle Scholar
  33. M. Xu, R. Bodik, and M. D. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In Proceedings of the 30th International Symposium on Computer Architecture (ISCA), June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transparent, lightweight application execution replay on commodity multiprocessor operating systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMETRICS '10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
              June 2010
              398 pages
              ISBN:9781450300384
              DOI:10.1145/1811039
              • cover image ACM SIGMETRICS Performance Evaluation Review
                ACM SIGMETRICS Performance Evaluation Review  Volume 38, Issue 1
                Performance evaluation review
                June 2010
                382 pages
                ISSN:0163-5999
                DOI:10.1145/1811099
                Issue’s Table of Contents

              Copyright © 2010 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 14 June 2010

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate459of2,691submissions,17%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader