Abstract
Reproducing executions of multithreaded programs is very challenging due to many intrinsic and external non-deterministic factors. Existing RnR systems achieve significant progress in terms of performance overhead, but none targets the in-situ setting, in which replay occurs within the same process as the recording process. Also, most existing work cannot achieve identical replay, which may prevent the reproduction of some errors.
This paper presents iReplayer, which aims to identically replay multithreaded programs in the original process (under the "in-situ" setting). The novel in-situ and identical replay of iReplayer makes it more likely to reproduce errors, and allows it to directly employ debugging mechanisms (e.g. watchpoints) to aid failure diagnosis. Currently, iReplayer only incurs 3% performance overhead on average, which allows it to be always enabled in the production environment. iReplayer enables a range of possibilities, and this paper presents three examples: two automatic tools for detecting buffer overflows and use-after-free bugs, and one interactive debugging tool that is integrated with GDB.
Supplemental Material
- 2017. Pure python memcached client. https://pypi.python.org/pypi/python-memcached.Google Scholar
- ab Developers. 2017. ab - Apache HTTP server benchmarking tool. https://httpd.apache.org/docs/2.4/programs/ab.html.Google Scholar
- Periklis Akritidis, Manuel Costa, Miguel Castro, and Steven Hand. 2009. Baggy bounds checking: an e_cient and backwards-compatible defense against out-of-bounds errors. In Proceedings of the 18th conference on USENIX security symposium (SSYM'09). USENIX Association, Berkeley, CA, USA, 51-66. http://dl.acm.org/citation.cfm?id=1855768.1855772 Google ScholarDigital Library
- Mohammad Mejbah ul Alam, Tongping Liu, Guangming Zeng, and Abdullah Muzahid. 2017. SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). ACM, New York, NY, USA, 298-313. Google ScholarDigital Library
- Gautam Altekar and Ion Stoica. 2009. ODR: Output-deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 193-206. Google ScholarDigital Library
- Joy Arulraj, Guoliang Jin, and Shan Lu. 2014. Leveraging the Short-term Memory of Hardware to Diagnose Production-run Software Failures. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 207-222. Google ScholarDigital Library
- Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. 2010. Efficient System-enforced Deterministic Parallelism. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 1-16. http://dl.acm.org/citation.cfm?id=1924943.1924957 Google ScholarDigital Library
- Thomas Ball and James R. Larus. 1996. Efficient Path Profiling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 29). IEEE Computer Society, Washington, DC, USA, 46-57. http://dl.acm.org/citation.cfm?id=243846.243857 Google ScholarDigital Library
- Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 53-64. Google ScholarDigital Library
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, USA, 117-128. Google ScholarDigital Library
- Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2001. Composing High-performance Memory Allocators. In Proceedings of the ACM SIGPLAN 2001 on Programming Language Design and Implementation (PLDI '01). ACM, New York, NY, USA, 114-124. Google ScholarDigital Library
- Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2002. Reconsidering Custom Memory Allocation. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '02). ACM, New York, NY, USA, 1-12. Google ScholarDigital Library
- Sanjay Bhansali, Wen-Ke Chen, Stuart de Jong, Andrew Edwards, Ron Murray, Milenko Drinic, Darek Mihocka, and Joe Chau. 2006. Framework for Instruction-level Tracing and Analysis of Program Executions. In Proceedings of the 2Nd International Conference on Virtual Execution Environments (VEE '06). ACM, New York, NY, USA, 154-163. Google ScholarDigital Library
- Christian Bienia and Kai Li. 2009. PARSEC 2.0: A NewBenchmark Suite for Chip-Multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.Google Scholar
- Hans-J. Boehm, Alan J. Demers, and Scott Shenker. 1991. Mostly Parallel Garbage Collection. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI '91). ACM, New York, NY, USA, 157-164. Google ScholarDigital Library
- Michael D. Bond, Milind Kulkarni, Man Cao, Meisam Fathi Salmi, and Jipeng Huang. 2015. Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine. In Proceedings of the Principles and Practices of Programming on The Java Platform (PPPJ '15). ACM, New York, NY, USA, 90-101. Google ScholarDigital Library
- Brad Spengler. 2003. PaX: The Guaranteed End of Arbitrary Code Execution. https://grsecurity.net/PaX-presentation.ppt.Google Scholar
- Derek Bruening and Qin Zhao. 2011. Practical memory checking with Dr. Memory. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 213-223. http://dl.acm.org/citation.cfm?id=2190025.2190067 Google ScholarDigital Library
- Bugzilla. 2013. "libtiff (gif2tiff): possible heapbased buffer overflow in readgifimage()". http://bugzilla.maptools.org/show_bug.cgi?id=2451.Google Scholar
- Crispin Cowan, Calton Pu, Dave Maier, Heather Hinton, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks. In In Proceedings of the 7th USENIX Security Symposium. 63-78. Google ScholarDigital Library
- cppreference. 2017. Atomic operations library. http://en.cppreference.com/w/cpp/atomic.Google Scholar
- Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang, Garth A. Gibson, and Randal E. Bryant. 2013. Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 388-405. Google ScholarDigital Library
- Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, and Junfeng Yang. 2011. Efficient Deterministic Multithreading Through Schedule Relaxation. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 337-351. Google ScholarDigital Library
- David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. 2014. Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 525-540. http://dl.acm.org/citation.cfm?id=2685048.2685090 Google ScholarDigital Library
- Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 67-78. Google ScholarDigital Library
- George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. 2002. ReVirt: Enabling Intrusion Analysis Through Virtual-machine Logging and Replay. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 211-224. Google ScholarDigital Library
- Laura Effinger-Dean, Brandon Lucia, Luis Ceze, Dan Grossman, and Hans-J. Boehm. 2012. IFRit: Interference-free Regions for Dynamic Data-race Detection. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '12). ACM, New York, NY, USA, 467-484. Google ScholarDigital Library
- Frank Ch. Eigler. 2003. Mud_ap: pointer use checking for C/C++. Red Hat Inc.Google Scholar
- Weining Gu, Z. Kalbarczyk, Ravishankar, K. Iyer, and Zhenyu Yang. 2003. Characterization of linux kernel behavior under errors. In 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings. 459-468.Google Scholar
- Zhenyu Guo, Xi Wang, Jian Tang, Xuezheng Liu, Zhilei Xu, Ming Wu, M Frans Kaashoek, and Zheng Zhang. 2008. R2: An application-level kernel for record and replay. In Proceedings of the 8th USENIX conference on Operating systems design and implementation. USENIX Association, 193-208. Google ScholarDigital Library
- Niranjan Hasabnis, Ashish Misra, and R. Sekar. 2012. Light-weight bounds checking. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO '12). ACM, New York, NY, USA, 135-144. Google ScholarDigital Library
- Reed Hastings and Bob Joyce. 1991. Purify: Fast detection of memory leaks and access errors. In In Proc. of the Winter 1992 USENIX Conference. 125-138.Google Scholar
- Nima Honarmand, Nathan Dautenhahn, Josep Torrellas, Samuel T. King, Gilles Pokam, and Cristiano Pereira. 2013. Cyrus: Unintrusive Application-level Record-replay for Replay Parallelism. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 193-206. Google ScholarDigital Library
- Nima Honarmand and Josep Torrellas. 2014. Replay Debugging: Leveraging Record and Replay for Program Debugging. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 445-456. http://dl.acm.org/citation.cfm?id=2665671.2665737 Google ScholarDigital Library
- Derek R. Hower and Mark D. Hill. 2008. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 265-276. Google ScholarDigital Library
- Jeff Huang, Peng Liu, and Charles Zhang. 2010. LEAP: lightweight deterministic multi-processor replay of concurrent java programs. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 207-216. Google ScholarDigital Library
- Jeff Huang, Charles Zhang, and Julian Dolby. 2013. CLAP: Recording Local Executions to Reproduce Concurrency Failures. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 141-152. Google ScholarDigital Library
- Shiyou Huang, Bowen Cai, and Jeff Huang. 2017. Towards Production-Run Heisenbugs Reproduction on Commercial Hardware. In 2017 USENIX Annual Technical Conference. USENIX Association, 403-415. Google ScholarDigital Library
- Intel Corporation. 2012. Intel Inspector XE 2013. http://software.intel.com/en-us/intel-inspector-xe.Google Scholar
- Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. 2015. Failure Sketching: A Technique for Automated Root Cause Diagnosis of In-production Failures. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 344-360. Google ScholarDigital Library
- Joseph Kulandai. 2011. Java Hashtable. http://javapapers.com/core-java/java-hashtable/.Google Scholar
- Lubomir Kundrak. 2007. Buffer overflow in bzip2's bzip2recover. https://bugzilla.redhat.com/show_bug.cgi?id=226979.Google Scholar
- Dongyoon Lee, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2012. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 463- 474. Google ScholarDigital Library
- Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. 2010. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 77-90. Google ScholarDigital Library
- Kyu Hyung Lee, Dohyeong Kim, and Xiangyu Zhang. 2014. Infrastructure-Free Logging and Replay of Concurrent Execution on Multiple Cores. In Proceedings of the 28th European Conference on ECOOP 2014 -- Object-Oriented Programming - Volume 8586. Springer-Verlag New York, Inc., New York, NY, USA, 232-256. Google ScholarDigital Library
- Peng Liu, Xiangyu Zhang, Omer Tripp, and Yunhui Zheng. 2015. Light: Replay via Tightly Bounded Recording. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). ACM, New York, NY, USA, 55-64. Google ScholarDigital Library
- Tongping Liu and Emery D. Berger. 2011. SHERIFF: precise detection and automatic mitigation of false sharing. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications (OOPSLA '11). ACM, New York, NY, USA, 3-18. Google ScholarDigital Library
- Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2011. Dthreads: efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 327-336. Google ScholarDigital Library
- Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2016. Double-Take: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis. In Proceedings of 38th International Conference on Software Engineering (ICSE'16). ACM, New York, NY, USA. Google ScholarDigital Library
- Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. 2005. Bugbench: Benchmarks for evaluating bug detection tools. In In Workshop on the Evaluation of Software Defect Detection Tools.Google Scholar
- Nuno Machado, Brandon Lucia, and Luís Rodrigues. 2015. Concurrency debugging with differential schedule projections. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 586-595. Google ScholarDigital Library
- Ali José Mashtizadeh, Tal Gar_nkel, David Terei, David Mazieres, and Mendel Rosenblum. 2017. Towards Practical Default-On Multi-Core Record/Replay. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 693-708. Google ScholarDigital Library
- Microsoft. 2007. What is the Staging Environment? https://msdn.microsoft.com/en-us/library/ms942990(v=cs.70).aspx.Google Scholar
- Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 289-300. Google ScholarDigital Library
- Mozilla Corporation. 2017. rr: lightweight recording & deterministic debugging. http://rr-project.org/.Google Scholar
- Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and Reproducing Heisenbugs in Concurrent Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 267-280. http://dl.acm.org/citation.cfm?id=1855741.1855760 Google ScholarDigital Library
- George C. Necula Necula, McPeak Scott, and Weimer Westley. 2002. CCured: Type-Safe Retrofitting of Legacy Code. In Proceedings of the Principles of Programming Languages. 128-139. Google ScholarDigital Library
- Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI '07). ACM, New York, NY, USA, 89-100. Google ScholarDigital Library
- Oracle Corporation. 2011. Sun Memory Error Discovery Tool (Discover). http://docs.oracle.com/cd/E18659_01/html/821-1784/gentextid-302.html.Google Scholar
- Robert O'Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. 2017. Engineering Record And Replay For Deployability. In 2017 USENIX Annual Technical Conference. USENIX Association. Google ScholarDigital Library
- parasoft Company. 2013. Runtime Analysis and Memory Error Detection for C and C++.Google Scholar
- Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. 2009. PRES: probabilistic replay with execution sketching on multiprocessors. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09). ACM, New York, NY, USA, 177-192. Google ScholarDigital Library
- Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, and James Cownie. 2010. PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '10). ACM, New York, NY, USA, 2-11. Google ScholarDigital Library
- Feng Qin, Joseph Tucek, Jagadeesan Sundaresan, and Yuanyuan Zhou. 2005. Rx: Treating Bugs As Allergies--a Safe Method to Survive Software Failures. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles (SOSP '05). ACM, New York, NY, USA, 235-248. Google ScholarDigital Library
- James Reinders. 2013. "Processor Tracing". https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing.Google Scholar
- Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst. 17, 2 (May 1999), 133-152. Google ScholarDigital Library
- Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: a fully integrated practical record/replay system. ACM Trans. Comput. Syst. 17, 2 (May 1999), 133-152. Google ScholarDigital Library
- Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 28-28. http://dl.acm.org/citation.cfm?id=2342821.2342849 Google ScholarDigital Library
- SQL Developers. 2017. How SQLite Is Tested. https://www.sqlite.org/testing.html.Google Scholar
- Talos. 2016. "Memcached Server SASL Autentication Remote Code Execution Vulnerability". https://www.talosintelligence.com/reports/TALOS-2016-0221/.Google Scholar
- Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. 2007. Triage: Diagnosing Production Run Failures at the User's Site. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07). ACM, New York, NY, USA, 131-144. Google ScholarDigital Library
- Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. DoublePlay: parallelizing sequential logging and replay. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 15-26. Google ScholarDigital Library
- Yan Wang, Harish Patil, Cristiano Pereira, Gregory Lueck, Rajiv Gupta, and Iulian Neamtiu. 2014. DrDebug: Deterministic Replay Based Cyclic Debugging with Dynamic Slicing. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). ACM, New York, NY, USA, Article 98, 11 pages. Google ScholarDigital Library
- Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. 2010. Ad Hoc Synchronization Considered Harmful. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 163-176. http://dl.acm.org/citation.cfm?id=1924943.1924955 Google ScholarDigital Library
- Qiang Zeng, Dinghao Wu, and Peng Liu. 2011. Cruiser: concurrent heap buffer overflow monitoring using lock-free data structures. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI '11). ACM, New York, NY, USA, 367-377. Google ScholarDigital Library
Index Terms
- iReplayer: in-situ and identical record-and-replay for multithreaded applications
Recommendations
iReplayer: in-situ and identical record-and-replay for multithreaded applications
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and ImplementationReproducing executions of multithreaded programs is very challenging due to many intrinsic and external non-deterministic factors. Existing RnR systems achieve significant progress in terms of performance overhead, but none targets the in-situ setting, ...
Versatile yet lightweight record-and-replay for Android
OOPSLA '15Recording and replaying the execution of smartphone apps is useful in a variety of contexts, from reproducing bugs to profiling and testing. Achieving effective record-and-replay is a balancing act between accuracy and overhead. On smartphones, the act ...
Versatile yet lightweight record-and-replay for Android
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsRecording and replaying the execution of smartphone apps is useful in a variety of contexts, from reproducing bugs to profiling and testing. Achieving effective record-and-replay is a balancing act between accuracy and overhead. On smartphones, the act ...
Comments