ABSTRACT
As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by encapsulating independent workloads into virtual machines (VMs) and consolidating them onto the same machine. SMP virtual machines have been widely adopted to exploit parallelism. For virtualized systems, such as a public cloud, fairness between tenants and the efficiency of running their applications are keys to success. However, we find that existing virtualization platforms fail to enforce fairness between VMs with different number of virtual CPUs (vCPU) that run on multiple CPUs. We attribute the unfairness to the use of per-CPU schedulers and the load imbalance on these CPUs that incur inaccurate CPU allocations. Unfortunately, existing approaches to reduce unfairness, e.g., dynamic load balancing and CPU capping, introduce significant inefficiencies to parallel workloads.
In this paper, we present Flex, a vCPU scheduling scheme that enforces fairness at VM-level and improves the efficiency of hosted parallel applications. Flex centers on two key designs: (1) dynamically adjusting vCPU weights (FlexW) on multiple CPUs to achieve VM-level fairness and (2) flexibly scheduling vCPUs (FlexS) to minimize wasted busy-waiting time. We have implemented Flex in Xen and performed comprehensive evaluations with various parallel workloads. Results show that Flex is able to achieve CPU allocations with on average no more than 5% error compared to the ideal fair allocation. Further, Flex outperforms Xen's credit scheduler and two representative co-scheduling approaches by as much as 10X for parallel applications using busy-waiting or blocking synchronization methods.
- Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/.Google Scholar
- AMD Corporation. AMD64 architecture programmer's manual volume 2: System programming. 2010.Google Scholar
- M. B. Anwer, A. Nayak, N. Feamster, and L. Liu. Network i/o fairness in virtual machines. In Proc. of VISA, 2010. Google ScholarDigital Library
- A. C. Arpaci-Dusseau. Implicit coscheduling: coordinated scheduling with implicit information in distributed systems. ACM Trans. Comput. Syst., 19 (3), 2001. Google ScholarDigital Library
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks-summary and preliminary results. In Proc. of SC, 1991. Google ScholarDigital Library
- K. based virtual machine. http://www.linux-kvm.org/.Google Scholar
- K. Chakraborty, P. M. Wells, and G. S. Sohi. Supporting overcommitted virtual machines through hardware spin detection. IEEE Trans. Parallel Distrib. Syst., 23 (2), Feb. 2012. Google ScholarDigital Library
- A. C. Dusseau, R. H. Arpaci, and D. E. Culler. Effective distributed scheduling of parallel workloads. In Proc. of SIGMETRICS, 1996. Google ScholarDigital Library
- Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual. December 2009.Google Scholar
- Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual. December 2009.Google Scholar
- H. Kim, S. Kim, J. Jeong, J. Lee, and S. Maeng. Demand-based coordinated scheduling for smp vms. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
- P. Lama and X. Zhou. NINEPIN: Non-invasive and energy efficient performance isolation in virtualized servers. In Proc. of DSN, 2012. Google ScholarDigital Library
- W. Lee, M. Frank, V. Lee, K. Mackenzie, and L. Rudolph. Implications of i/o for gang scheduled workloads. In Proc. of IPPS, 1997. Google ScholarDigital Library
- T. Li, D. Baumberger, and S. Hahn. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In Proc. of PPoPP, 2009. Google ScholarDigital Library
- P. B. Menage. Adding generic process containers to the linux kernel. In Proc. of OLS, 2010.Google Scholar
- M. Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12 (10), 2001. Google ScholarDigital Library
- R. Nathuji, A. Kansal, and A. Ghaffarkhah. Q-clouds: managing performance interference effects for qos-aware clouds. In Proc. of EuroSys, 2010. Google ScholarDigital Library
- R. Nikolaev and G. Back. Perfctr-xen: a framework for performance counter virtualization. In Proc. of VEE, 2011. Google ScholarDigital Library
- D. Ongaro, A. L. Cox, and S. Rixner. Scheduling i/o in virtual machine monitors. In Proc. of VEE, 2008. Google ScholarDigital Library
- J. Ousterhout. Scheduling techniques for concurrent systems. In Proc. of ICDCS, 1982.Google Scholar
- A. K. Parekh and R. G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: the single-node case. IEEE/ACM Trans. Netw., 1 (3), 1993. Google ScholarDigital Library
- J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing virtual machine scheduling in numa multicore systems. In Proc. of HPCA, 2013. Google ScholarDigital Library
- D. Shue, M. J. Freedman, and A. Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In Proc. of OSDI, 2012. Google ScholarDigital Library
- P. Sobalvarro, S. Pakin, W. E. Weihl, and A. A. Chien. Dynamic coscheduling on workstation clusters. In Proc. of JSSPP, 1998. Google ScholarDigital Library
- X. Song, J. Shi, H. Chen, and B. Zang. Schedule processes, not vcpus. In Proc. of APSys, 2013. Google ScholarDigital Library
- SPEC Java Server Benchmark. http://www.spec.org/jbb2005/.Google Scholar
- O. Sukwong and H. S. Kim. Is co-scheduling too expensive for smp vms? In Proc. of EuroSys, 2011. Google ScholarDigital Library
- The Apache Mahout? machine learning library. http://mahout.apache.org/.Google Scholar
- The CPU Scheduler in VMware vSphere® 5.1. http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf.Google Scholar
- The Princeton Application Repository for Shared-Memory Computers (PARSEC) . http://parsec.cs.princeton.edu/.Google Scholar
- The SPEC CPU2006 Benchmarks. http://www.spec.org/cpu2006/.Google Scholar
- V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In phProc. of VM, 2004. Google ScholarDigital Library
- VMware. http://www.vmware.com.Google Scholar
- C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In Proc. of HPDC, 2011. Google ScholarDigital Library
- Windows Azure Open Cloud Platform. http://www.windowsazure.com.Google Scholar
- Xen. http://www.xen.org/.Google Scholar
- C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. R. Kompella, and D. Xu. vslicer: latency-aware virtual machine scheduling via differentiated-frequency cpu slicing. In Proc. of HPDC, 2012. Google ScholarDigital Library
- C. Xu, S. Gamage, H. Lu, R. R. Kompella, and D. Xu. vturbo: Accelerating virtual machine i/o processing using designated turbo-sliced core. In Proc. of USENIX ATC, 2013. Google ScholarDigital Library
Index Terms
- Towards fair and efficient SMP virtual machine scheduling
Recommendations
Towards fair and efficient SMP virtual machine scheduling
PPoPP '14As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by ...
Scheduler activations for interference-resilient SMP virtual machine scheduling
Middleware '17: Proceedings of the 18th ACM/IFIP/USENIX Middleware ConferenceThe wide adoption of SMP virtual machines (VMs) and resource consolidation present challenges to efficiently executing multi-threaded programs in the cloud. An important problem is the semantic gaps between the guest OS and the hypervisor. The well-...
A lock-aware virtual machine scheduling scheme for synchronization performance
In virtualized environments, multiprocessor virtual machines encounter synchronization problems such as lock holder preemption (LHP) and lock waiter preemption (LWP). When the issue happens, a virtual CPU (VCPU) waiting for such locks spins for an ...
Comments