ABSTRACT
Checkpoint replication is a prevalent way of maintaining virtual machine availability in the presence of host failures. Since checkpoint replication can impose heavy load on network resources, checkpoint compression has been suggested to reduce network usage. This paper presents the first detailed evaluation and characterization of the effectiveness and overheads of checkpoint compression methods for various workloads frequently seen in high-availability systems. We propose a lightweight compression method that exploits similarities in checkpoints to eliminate redundant network traffic, and compare it with two well-known methods, gzip and delta compression. Our results show that gzip and delta compression reduce network traffic significantly for various workloads, but incur high CPU and memory overheads, respectively. The proposed similarity compression is most effective for VM clusters running homogeneous workloads, while using both CPU and memory efficiently. Based on our extensive evaluation, we suggest guidelines for selecting and using these compression methods.
- FFmpeg. http://www.ffmpeg.org.Google Scholar
- HPC Challenge. http://icl.cs.utk.edu/hpcc.Google Scholar
- LVM2. http://sourceware.org/lvm2.Google Scholar
- The RUBiS benchmark. http://rubis.ow2.org.Google Scholar
- The TPC-C-like benchmark of VoltDB. http://community.voltdb.com/node/134.Google Scholar
- VMware fault tolerance (FT). http://www.vmware.com/products/fault-tolerance.Google Scholar
- VoltDB. http://community.voltdb.com.Google Scholar
- A. Agarwal, D. Shah, N. Kalmala, N. Panchaksharam, R. Bharadhwaj, S. Lokray, S. Sm, and T. Bean. Method and apparatus for transactional fault tolerance in a client-server system, Oct. 2009. Patent, US 7610510.Google Scholar
- S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu. VMFlock: virtual machine co-migration for the cloud. In Proc. of the 20th Symp. on High Performance Distributed Computing, 2011. Google ScholarDigital Library
- A. Anand, A. Gupta, A. Akella, S. Seshan, and S. Shenker. Packet caches on routers: the implications of universal redundant traffic elimination. In Proc. of the SIGCOMM Conf., 2008. Google ScholarDigital Library
- A. Anand, V. Sekar, and A. Akella. SmartRE: an architecture for coordinated network-wide redundancy elimination. In Proc. of the SIGCOMM Conf., 2009. Google ScholarDigital Library
- T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. ACM Trans. on Computer System., 14(1), 1996. Google ScholarDigital Library
- B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchisonson, and A. Warfield. Remus: High-availability via asynchronous virtual machine replication. In Proc. of the 5th Symp. on Networked Systems Design and Implementation, 2008. Google ScholarDigital Library
- U. Deshpande, X. Wang, and K. Gopalan. Live gang migration of virtual machines. In Proc. of the Symp. on High Performance Distributed Computing, 2011. Google ScholarDigital Library
- B. Gerofi, Z. Vass, and Y. Ishikawa. Utilizing memory content similarity for improving the performance of replicated virtual machines. In Proc. of the 4th Conf. on Utility and Cloud Computing, 2011. Google ScholarDigital Library
- D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. In Proc. of the 8th Symp. on Operating Systems Design and Implementation, 2008. Google ScholarDigital Library
- K.-Y. Hou, M. Uysal, A. Merchant, K. G. Shin, and S. Singhal. HydraVM: Low-cost, transparent high availability for virtual machines. Technical report, HP Labs, 2011.Google Scholar
- R. Koller and R. Rangaswami. I/O Deduplication: Utilizing content similarity to improve I/O performance. In Proc. of the 8th Conf. on File and Storage Technologies, 2010. Google ScholarDigital Library
- M. Lu and T.-C. Chiueh. Fast memory state synchronization for virtualization-based fault tolerance. In Proc. of the 39th Conf. on Dependable Systems and Networks, 2009.Google ScholarCross Ref
- D. T. Meyer, G. Aggarwal, B. Cully, G. Lefebvre, M. J. Feeley, N. C. Hutchinson, and A. Warfield. Parallax: virtual disks for virtual machines. In Proc. of the 3rd EuroSys Conf., 2008. Google ScholarDigital Library
- U. F. Minhas, S. R. B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. RemusDB: Transparent high availability for database systems. PVLDB, 4(11), 2011.Google Scholar
- D. G. Murray, S. H, and M. A. Fetterman. Satori: Enlightened page sharing. In Proc. of the USENIX Annual Technical Conference, 2009. Google ScholarDigital Library
- S. Quinlan and S. Dorward. Venti: A new approach to archival storage. In Proc. of the 1st Conf. on File and Storage Technologies, 2002. Google ScholarDigital Library
- S. Rajagopalan, B. Cully, R. O'Connor, and A. Warfield. Secondsite: disaster tolerance as a service. In Proc. of the 8th Conf. on Virtual Execution Environments, 2012. Google ScholarDigital Library
- S. Rhea, R. Cox, and A. Pesterev. Fast, inexpensive content-addressed storage in Foundation. In Proc. of the USENIX Annual Technical Conference, 2008. Google ScholarDigital Library
- P. Riteau, C. Morin, and T. Priol. Shrinker: improving live migration of virtual clusters over WANs with distributed data deduplication and content-based addressing. In Proc. of the European Conference on Parallel Processing, 2011. Google ScholarDigital Library
- P. Svard, B. Hudzia, J. Tordsson, and E. Elmroth. Evaluation of delta compression techniques for efficient live migration of large virtual machines. In Proc. of the 7th Conf. on Virtual Execution Environments, 2011. Google ScholarDigital Library
- Y. Tamura, K. Sato, S. Kihara, and S. Moriai. Kemari: Virtual machine synchronization for fault tolerance. In USENIX Annual Technical Conference (Poster), 2008.Google Scholar
- K. V. Vishwanath and N. Nagappan. Characterizing cloud computing hardware reliability. In Proc. of the 1st Symposium on Cloud Computing, 2010. Google ScholarDigital Library
- C. A. Waldspurger. Memory resource management in VMware ESX server. In Proc. of the 5th Symp. on Operating Systems Design and Implementation, 2002. Google ScholarDigital Library
- T. Wood, K. K. Ramakrishnan, P. Shenoy, and J. Van der Merwe. Cloudnet: dynamic pooling of cloud resources by live WAN migration of virtual machines. In Proc. of the 7th Conf. on Virtual Execution Environments, 2011. Google ScholarDigital Library
- X. Zhang, Z. Huo, J. Ma, and D. Meng. Exploiting data deduplication to accelerate live virtual machine migration. In Proc. of the International Conf. on Cluster Computing, 2010. Google ScholarDigital Library
Index Terms
- Tradeoffs in compressing virtual machine checkpoints
Recommendations
Evaluation of delta compression techniques for efficient live migration of large virtual machines
VEE '11Despite the widespread support for live migration of Virtual Machines (VMs) in current hypervisors, these have significant shortcomings when it comes to migration of certain types of VMs. More specifically, with existing algorithms, there is a high risk ...
Evaluation of delta compression techniques for efficient live migration of large virtual machines
VEE '11: Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsDespite the widespread support for live migration of Virtual Machines (VMs) in current hypervisors, these have significant shortcomings when it comes to migration of certain types of VMs. More specifically, with existing algorithms, there is a high risk ...
A Technical Review for Efficient Virtual Machine Migration
CUBE '13: Proceedings of the 2013 International Conference on Cloud & Ubiquitous Computing & Emerging TechnologiesThis paper presents the recent technical research survey on the efficient live migration of virtual machines. Virtual machine migration is required for many reasons like load balancing, energy reduction, dynamic resizing, and to increase availability. ...
Comments