skip to main content
tutorial
Open Access

VM Live Migration At Scale

Published:25 March 2018Publication History
Skip Abstract Section

Abstract

Uninterrupted uptime is a critical aspect of Virtual Machines (VMs) offered by cloud hosting providers. Google's VMs run on top of rapidly changing infrastructure: we regularly update hardware and host software, and we must quickly respond to failing hardware. Frequent change is critical to both development velocity---deploying new versions of services and infrastructure---and the ability to respond rapidly to defects, including critical security fixes. Typically these updates would be disruptive, resulting in VM termination or restart. In this paper we present how we use VM live migration at scale to eliminate this disruption with minimal impact to the guest, performing over 1,000,0001migrations monthly in our production fleet, with 50ms median blackout, 300ms 99th percentile blackout.

References

  1. W.-D. W. Bianca Schroeder, Eduardo Pinheiro. DRAM Errors in the Wild: A Large-Scale Field Study. In SIGMETRICS/Performance, SIGMETRICS/Performance'09, Seattle, WA, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live Migration of Virtual Machines. In Proc of NSDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Corbet. fincore(). https://lwn.net/Articles/371538/, 2010.Google ScholarGoogle Scholar
  4. W.-D. W. Eduardo Pinheiro and L. A. Barroso. Failure Trends in a Large Disk Drive Population. In Proceedings of the 5th USENIX Conference on File and Storage Technologies, FAST'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Google. Adding SSDs - Compute Engine - Google Cloud Platform. https://cloud.google.com/compute/docs/disks/local-ssd, 2016.Google ScholarGoogle Scholar
  6. Google. Storage Options - Compute Engine - Google Cloud Platform. https://cloud.google.com/compute/docs/disks/#pdspecs, 2016.Google ScholarGoogle Scholar
  7. Google. What is Google Compute Engine? - Compute Engine - Google Cloud Platform. https://cloud.google.com/compute/docs/, 2016.Google ScholarGoogle Scholar
  8. M. R. Hines, U. Deshpande, and K. Gopalan. Post-Copy Live Migration of Virtual Machines. volume 43, pages 14--26, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Z. Ibrahim, S. Hofmeyr, C. Iancu, and E. Roman. Optimized Pre-copy Live Migration for Memory Intensive Applications. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 40:1--40:11, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Mashtizadeh, E. Celebi, T. Garfinkel, and M. Cai. The Design and Evolution of Live Storage Migration in VMware ESX. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'11, pages 14--14, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Nathan, U. Bellur, and P. Kulkarni. Towards a Comprehensive Performance Model of Virtual Machine Live Migration. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 288--301, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Nelson, B.-H. Lim, and G. Hutchins. Fast Transparent Migration for Virtual Machines. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '05, pages 25--25, Berkeley, CA, USA, 2005. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hlzle, S. Stuart, and A. Vahdat. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Googles Datacenter Network. In Sigcomm '15, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Song, Shi, Liu, Yang, and Chen. Parallelizing Live Migration of Virtual Machines. In Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, VEE '13, pages 85--96, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. M. Theimer, K. A. Lantz, and D. R. Cheriton. Preemptable Remote Execution Facilities for the V-system. In Proceedings of the Tenth ACM Symposium on Operating Systems Principles, SOSP '85, pages 2--12, New York, NY, USA, 1985. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Verma, L. Pedrosa, M. R. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys), Bordeaux, France, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. VMware. vMotion Architecture, Performance, and Best Practices in VMware vSphere 5. https://www.vmware.com/files/pdf/vmotion-perf-vsphere5.pdf, 2011.Google ScholarGoogle Scholar
  18. VMware. VMware vSphere 5.1 vMotion Architecture, Performance, and Best Practices. https://www.vmware.com/files/pdf/techpaper/VMware-vSphere51-vMotion-Perf.pdf, 2012.Google ScholarGoogle Scholar
  19. S. V. Woudenberg. Lessons learned from a year of using live migration in production on Google Cloud. https://cloudplatform.googleblog.com/2016/04/lessons-learned-from-a-year-of-using-live-migration-in-production-on-Google-Cloud.html, 2016.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 3
    VEE '18
    March 2018
    99 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3296975
    Issue’s Table of Contents
    • cover image ACM Conferences
      VEE '18: Proceedings of the 14th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
      March 2018
      106 pages
      ISBN:9781450355797
      DOI:10.1145/3186411

    Copyright © 2018 Owner/Author

    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 March 2018

    Check for updates

    Qualifiers

    • tutorial
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader