skip to main content
research-article
Open Access

Dual-Page Checkpointing: An Architectural Approach to Efficient Data Persistence for In-Memory Applications

Authors Info & Claims
Published:08 January 2019Publication History
Skip Abstract Section

Abstract

Data persistence is necessary for many in-memory applications. However, the disk-based data persistence largely slows down in-memory applications. Emerging non-volatile memory (NVM) offers an opportunity to achieve in-memory data persistence at the DRAM-level performance. Nevertheless, NVM typically requires a software library to operate NVM data, which brings significant overhead.

This article demonstrates that a hardware-based high-frequency checkpointing mechanism can be used to achieve efficient in-memory data persistence on NVM. To maintain checkpoint consistency, traditional logging and copy-on-write techniques incur excessive NVM writes that impair both performance and endurance of NVM; recent work attempts to solve the issue but requires a large amount of metadata in the memory controller. Hence, we design a new dual-page checkpointing system, which achieves low metadata cost and eliminates most excessive NVM writes at the same time. It breaks the traditional trade-off between metadata space cost and extra data writes. Our solution outperforms the state-of-the-art NVM software libraries by 13.6× in throughput, and leads to 34% less NVM wear-out and 1.28× higher throughput than state-of-the-art hardware checkpointing solutions, according to our evaluation with OLTP, graph computing, and machine-learning workloads.

References

  1. Saurabh Agarwal, Rahul Garg, Meeta S. Gupta, and Jose E. Moreira. 2004. Adaptive incremental checkpointing for massively parallel systems. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS’04). ACM, 277--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Akinaga and H. Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010).Google ScholarGoogle Scholar
  3. Alluxio Open Foundation. 2017. Open Source Memory Speed Virtual Distributed Storage. Retrieved from http://www.alluxio.org/.Google ScholarGoogle Scholar
  4. Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, Alexander Driskill-Smith, and Mohamad Krounbi. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J. Emerg. Technol. Comput. Syst. 9, 2, Article 13 (May 2013), 35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD’15). 1383--1394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Austin R. Benson, Sven Schmit, and Robert Schreiber. 2015. Silent error detection in numerical time-stepping schemes. Int. J. High Perform. Comput. Appl. 29, 4 (2015), 403--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. 2015. REWIND: Recovery write-ahead system for in-memory non-volatile data-structures. Proc. VLDB Endow. 8, 5 (Jan. 2015), 497--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). Article 113, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’11). 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. 133--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xiangyu Dong, Naveen Muralimanohar, Norm Jouppi, Richard Kaufmann, and Yuan Xie. 2009. Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. 57:1--57:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Subramanya R. Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys’14). Article 15, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ifeanyi P. Egwutuoha, David Levy, Bran Selic, and Shiping Chen. 2013. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems. J. Supercomput. 65, 3 (01 Sep 2013), 1302--1326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David Fiala, Frank Mueller, Christian Engelmann, Rolf Riesen, Kurt Ferreira, and Ron Brightwell. 2012. Detection and correction of silent data corruption for large-scale high-performance computing. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 78:1--78:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Balint Fleischer. 2016. Storage Class Memory in Scalable Cognitive Systems. Keynote in Flash Memory Summit. Retrieved from https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2016/20160809_Keynote5_Huawei_Fleischer.pdf.Google ScholarGoogle Scholar
  16. Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10). 61--74. Retrieved from http://dl.acm.org/citation.cfm?id=1924943.1924948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shen Gao, Bingsheng He, and Jianliang Xu. 2015. Real-time in-memory checkpointing for future hybrid memory systems. In Proceedings of the 29th ACM on International Conference on Supercomputing. 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. R. Giles, K. Doshi, and P. Varman. 2015. SoftWrAP: A lightweight framework for transactional support of storage class memory. In Proceedings of the 31st Symposium on Mass Storage Systems and Technologies (MSST’15). 1--14.Google ScholarGoogle Scholar
  20. Tae Jun Ham, Bharath K. Chelepalli, Neng Xue, and Benjamin C. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture. 424--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Intel. 2016. The NVM Library. Retrieved from http://pmem.io/.Google ScholarGoogle Scholar
  22. Intel. 2017. Intel Optane Technology. Retrieved from http://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.Google ScholarGoogle Scholar
  23. Ioannis Doudalis and Milos Prvulovic. 2012. Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability. In Proceedings of the 39th Annual International Symposium on Computer Architecture. 261--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Kannan, A. Gavrilovska, K. Schwan, and D. Milojicic. 2013. Optimizing checkpoints using NVM as virtual memory. In Proceedings of the IEEE 27th International Symposium on Parallel Distributed Processing. 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kimberly Keeton. 2017. Memory-Driven Computing. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). Retrieved from https://www.usenix.org/conference/fast17/technical-sessions/presentation/keeton.Google ScholarGoogle Scholar
  26. Aasheesh Kolli, Steven Pelley, Ali Saidi, Peter M. Chen, and Thomas F. Wenisch. 2016. High-performance transactions for persistent memories. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). 399--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Kültürsay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’13). 256--267.Google ScholarGoogle Scholar
  29. B.C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, E. Ipek, O. Mutlu, and D. Burger. 2010. Phase-change technology and the future of main memory. IEEE Micro 30 (Jan. 2010), 131--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable DRAM alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-direct: High-performance in-memory key-value store with programmable NIC. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). 137--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’14). Article 6, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. 583--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Harold Lim and Shivnath Babu. 2013. Execution and optimization of continuous queries with cyclops. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 1069--1072. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wei Lin, Haochuan Fan, Zhengping Qian, Junwei Xu, Sen Yang, Jingren Zhou, and Lidong Zhou. 2016. STREAMSCOPE: Continuous reliable distributed processing of big data streams. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI’16). 439--453. Retrieved from http://dl.acm.org/citation.cfm?id=2930611.2930640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, and Jinglei Ren. 2017. DudeTM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). 329--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Mirhosseini, A. Agrawal, and J. Torrellas. 2016. Survive: Pointer-based in-DRAM incremental checkpointing for low-cost data persistence and rollback-recovery. IEEE Comput. Architect. Lett. PP, 99 (2016), 1--1.Google ScholarGoogle Scholar
  38. Iulian Moraru, David G. Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. 2013. Consistent, durable, and safe memory management for byte-addressable non volatile main memory. In Proceedings of the 1st ACM SIGOPS Conference on Timely Results in Operating Systems (TRIOS’13). Article 1, 17 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. David M. Mount and Sunil Arya. 1998. ANN: A library for approximate nearest neighbor searching. Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms.Google ScholarGoogle Scholar
  40. M Muja and D. G. Lowe. 2014. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 11 (2014), 2227--2240.Google ScholarGoogle ScholarCross RefCross Ref
  41. Sanketh Nalli, Swapnil Haria, Mark D. Hill, Michael M Swift, Haris Volos, and Kimberly Keeton. 2017. An analysis of persistent memory use with WHISPER. ACM SIGOPS Operat. Syst. Rev. 51, 4 (2017), 135--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dushyanth Narayanan and Orion Hodson. 2012. Whole-system persistence. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 401--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 1--14. http://dl.acm.org/citation.cfm?id=1298455.1298457 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang. 2015. The RAMCloud storage system. ACM Trans. Comput. Syst. 33, 3, Article 7 (Aug. 2015), 55 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Patel, F. Afram, Shunfei Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 48th ACM/EDAC/IEEE Design Automation Conference. 1050--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thao N. Pham, Panos K. Chrysanthis, and Alexandros Labrinidis. 2016. Avoiding class warfare: Managing continuous queries with differentiated classes of service. VLDB J. 25, 2 (2016), 197--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Poremba, T. Zhang, and Y. Xie. 2015. NVMain 2.0: Architectural simulator to model (non-)volatile memory systems. Comput. Architect. Lett. PP, 99 (2015), 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional flash. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 147--160. Retrieved from http://dl.acm.org/citation.cfm?id=1855741.1855752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Milos Prvulovic, Zheng Zhang, and Josep Torrellas. 2002. ReVive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). 111--122. Retrieved from http://dl.acm.org/citation.cfm?id=545215.545228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. Prvulovic, Zheng Zhang, and J. Torrellas. 2002. ReVive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In Proceedings of the 29th Annual International Symposium on Computer Architecture. 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing. 85--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung, and C. H. Lam. 2008. Phase-change random access memory: A scalable technology. IBM J. Res. Dev. 52, 4 (July 2008), 465--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jinglei Ren, Qingda Hu, Samira Khan, and Thomas Moscibroda. 2017. Programming for non-volatile main memory is hard. In Proceedings of the 8th Asia-Pacific Workshop on Systems (APSys’17). Article 13, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. 2015. ThyNVM: Enabling software-transparent crash consistency in persistent memory systems. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’15). 672--685. Retrieved from http://persper.com/thynvm/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. Comput. Architect. Lett. 10, 1 (2011), 16--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Vivek Seshadri, Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, and Trishul Chilimbi. 2015. Page overlays: An enhanced virtual memory framework to enable fine-grained memory management. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). 79--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Weidong Shi, H. H. S. Lee, L. Falk, and M. Ghosh. 2006. An integrated framework for dependable and revivable architectures using multicore processors. In Proceedings of the 33rd International Symposium on Computer Architecture. 102--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Neuvonen Simo, Wolski Antoni, Manner Markk, and Raatikka Vilho. {n.d.}. Telecom Application Transaction Processing Benchmark. Retrieved from http://tatpbenchmark.sourceforge.net/.Google ScholarGoogle Scholar
  61. Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, and David A. Wood. 2002. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA’02). 123--134. http://dl.acm.org/citation.cfm?id=545215.545229 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Standard Performance Evaluation Corporation. {n.d.}. SPEC CPU 2006. Retrieved from http://www.spec.org/cpu2006.Google ScholarGoogle Scholar
  63. Jim Stevens, Paul Tschirhart, Mu-Tien Chang, Ishwar Bhati, Peter Enns, James Greensky, Zeshan Chisti, SL Lu, and B Jacob. 2013. An integrated simulation infrastructure for the entire memory hierarchy: Cache, DRAM, nonvolatile memory, and disk. Intel. Technol. J. 17, 1 (2013), 184--200.Google ScholarGoogle Scholar
  64. Nisha Talagala. 2016. The New Storage Applications: Lots of Data, New Hardware and Machine Intelligence. Keynote address. In Proceedings of the 4th Workshop on Interactions of NVM/Flash with Operating Systems and Workloads.Google ScholarGoogle Scholar
  65. The Transaction Processing Council. 2017. TPC-C benchmark Version 5. Retrieved from http://www.tpc.org/tpcc/.Google ScholarGoogle Scholar
  66. Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). 18--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and durable data structures for non-volatile byte-addressable memory. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies (FAST’11). 61--75. Retrieved from http://dl.acm.org/citation.cfm?id=1960475.1960480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. D. Vogt, C. Giuffrida, H. Bos, and A. S. Tanenbaum. 2015. Lightweight memory checkpointing. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 474--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’11). 91--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. John Paul Walters and Vipin Chaudhary. 2007. A scalable asynchronous replication-based strategy for fault tolerant MPI applications. In Proceedings of the 14th International Conference on High Performance Computing. 257--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. J. P. Walters and V. Chaudhary. 2009. Replication-based fault tolerance for MPI applications. IEEE Trans. Parallel Distrib. Syst. 20, 7 (2009), 997--1010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Matei Zaharia. 2016. Continuous Applications: Evolving Streaming in Apache Spark 2.0: A foundation for end-to-end real-time applications. Databricks Engineering Blog. Retrieved from https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html.Google ScholarGoogle Scholar
  74. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 15--28. Retrieved from http://dl.acm.org/citation.cfm?id=2228298.2228301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Yiying Zhang, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A reliable and highly available non-volatile memory system. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 3--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Gengbin Zheng, Xiang Ni, and L.V. Kale. 2012. A scalable double in-memory checkpoint and restart scheme towards exascale. In Proceedings of the IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops. 1--6.Google ScholarGoogle Scholar
  77. Wenting Zheng, Stephen Tu, Eddie Kohler, and Barbara Liskov. 2014. Fast databases with fast durability and recovery through multicore parallelism. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 465--477. Retrieved from http://dl.acm.org/citation.cfm?id=2685048.2685085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ruijin Zhou and Tao Li. 2013. Leveraging phase change memory to achieve efficient virtual machine execution. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dual-Page Checkpointing: An Architectural Approach to Efficient Data Persistence for In-Memory Applications

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 15, Issue 4
        December 2018
        706 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/3284745
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 January 2019
        • Revised: 1 October 2018
        • Accepted: 1 October 2018
        • Received: 1 March 2018
        Published in taco Volume 15, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format