skip to main content
research-article
Open Access

QuMan: Profile-based Improvement of Cluster Utilization

Published:28 August 2018Publication History
Skip Abstract Section

Abstract

Modern data centers consolidate workloads to increase server utilization and reduce total cost of ownership, and cope with scaling limitations. However, server resource sharing introduces performance interference across applications and, consequently, increases performance volatility, which negatively affects user experience. Thus, a challenging problem is to increase server utilization while maintaining application QoS.

In this article, we present QuMan, a server resource manager that uses application isolation and profiling to increase server utilization while controlling degradation of application QoS. Previous solutions, either estimate interference across applications and then restrict colocation to “compatible” applications, or assume that application requirements are known. Instead, QuMan estimates the required resources of applications. It uses an isolation mechanism to create properly-sized resource slices for applications, and arbitrarily colocates applications. QuMan’s mechanisms can be used with a variety of admission control policies, and we explore the potential of two such policies: (1) A policy that allows users to specify a minimum performance threshold and (2) an automated policy, which operates without user input and is based on a new combined QoS-utilization metric. We implement QuMan on top of Linux servers, and we evaluate its effectiveness using containers and real applications. Our single-node results show that QuMan balances highly effectively the tradeoff between server utilization and application performance, as it achieves 80% server utilization while the performance of each application does not drop below 80% the respective standalone performance. We also deploy QuMan on a cluster of 100 AWS instances that are managed by a modified version of the Sparrow scheduler [37] and, we observe a 48% increase in application performance on a highly utilized cluster, compared to the performance of the same cluster under the same load when it is managed by native Sparrow or Apache Mesos.

References

  1. Launching a Spark/Shark Cluster on EC2. Retrieved from http://ampcamp.berkeley.edu/exercises-strata-conf-2013/launching-a-cluster.html.Google ScholarGoogle Scholar
  2. The Apache HTTP Server. Retrieved from http://httpd.apache.org.Google ScholarGoogle Scholar
  3. Omar Arif Abdul-Rahman and Kento Aida. 2014. Towards understanding the usage behavior of Google cloud users: The mice and elephants phenomenon. In Proceedings of the 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom’14). IEEE, 272--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively unearthing the best cloud configurations for big data analytics. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, 469--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. J. Molec. Biol. 215, 3 (1990), 403--410.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jens Axboe. Flexible I/O Tester. Retrieved from https://github.com/axboe.Google ScholarGoogle Scholar
  7. Robert Birke, Lydia Y. Chen, and Evgenia Smirni. 2012. Data centers in the wild: A large performance study. Technical Report Z1204-002. IBM Research -- Zurich. Retrieved from http://domino.research.ibm.com/library/cyberdig.nsf/papers/0C306B31CF0D3861852579E40045F17F/$File/rz3820.pdf.Google ScholarGoogle Scholar
  8. Leon Bottou and Olivier Bousquet. 2007. The tradeoffs of large scale learning. In Proceedings of the 20th International Conference on Neural Information Processing Systems (NIPS’07). Curran Associates Inc., 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shimin Chen, Anastasia Ailamaki, Manos Athanassoulis, Phillip B. Gibbons, Ryan Johnson, Ippokratis Pandis, and Radu Stoica. 2011. TPC-E vs. TPC-C: Characterizing the new TPC-E benchmark via an I/O comparison study. ACM SIGMOD Rec. 39, 3 (2011), 5--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yanpei Chen, Sara Alspaugh, and Randy Katz. 2012. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proc. VLDB Endow. 5, 12 (2012), 1802--1813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 308--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). Curran Associates Inc., 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 127--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Eiman Ebrahimi, Chang Joo Lee, Onur Mutlu, and Yale N. Patt. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the 15th Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, 335--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrew D. Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: Guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, New York, NY, 99--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daniel Gmach, Jerry Rolia, Ludmila Cherkasova, and Alfons Kemper. 2007. Workload analysis and demand prediction of enterprise data center applications. In Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization (IISWC’07). IEEE Computer Society, 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Gong and X. Gu. 2010. PAC: Pattern-driven application consolidation for efficient cloud computing. In Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhenhuan Gong, Xiaohui Gu, and J. Wilkes. 2010. PRESS: PRedictive elastic resource scaling for cloud systems. In Proceedings of the 2010 International Conference on Network and Service Management. 9--16.Google ScholarGoogle Scholar
  20. Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic scheduling in multi-resource clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. James Hamilton. 2010. Overall Data Center Costs. Retrieved from http://perspectives.mvdirona.com/2010/09/overall-data-center-costs.Google ScholarGoogle Scholar
  22. Andrew Herdrich, Ramesh Illikkal, Ravi Iyer, Don Newell, Vineet Chadha, and Jaideep Moses. 2009. Rate-based QoS techniques for cache/memory in CMP platforms. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, NY, 479--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI’11). USENIX Association, 295--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ravi Iyer, Li Zhao, Fei Guo, Ramesh Illikkal, Srihari Makineni, Don Newell, Yan Solihin, Lisa Hsu, and Steve Reinhardt. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). ACM, New York, NY, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Íñigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards automated SLOs for enterprise clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Harshad Kasture and Daniel Sanchez. 2014. Ubik: Efficient cache sharing with strict Qos for latency-critical workloads. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 729--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Eugene Kim. 2015. This One Chart Shows The Vicious Price War Going On In Cloud Computing. Retrieved from http://www.businessinsider.com/cloud-computing-price-war-in-one-chart-2015-1.Google ScholarGoogle Scholar
  28. David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA’15). ACM, New York, NY, 450--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, 248--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Mavridis, Y. Sfakianakis, A. Papagiannis, M. Marazakis, and A. Bilas. 2014. Jericho: Achieving scalability through optimal data placement on multicore systems. In Proceedings of the 2014 30th Symposium on Mass Storage Systems and Technologies (MSST’14). 1--10.Google ScholarGoogle Scholar
  31. Asit K. Mishra, Joseph L. Hellerstein, Walfredo Cirne, and Chita R. Das. 2010. Towards characterizing cloud backend workloads: Insights from google compute clusters. ACM SIGMETRICS Perf. Eval. Rev. 37, 4 (2010), 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Onur Mutlu and Thomas Moscibroda. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40). IEEE Computer Society, 146--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Raghunath Nambiar, Nicholas Wakou, Forrest Carman, and Michael Majdalany. 2011. Transaction Processing Performance Council (TPC): State of the Council 2010.Google ScholarGoogle Scholar
  35. Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of the 5th European Conference on Computer Systems (EuroSys’10). ACM, New York, NY, 237--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dejan Novaković, Nedeljko Vasić, Stanko Novaković, Dejan Kostić, and Ricardo Bianchini. 2013. DeepDive: Transparently identifying and managing performance interference in virtualized environments. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC’13). USENIX Association, 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. ACM, 69--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Benedict Paten, Mark Diekhans, Brian J. Druker, Stephen Friend, Justin Guinney, Nadine Gassner, Mitchell Guttman, W. James Kent, Patrick Mantey, Adam A. Margolin, M. Massie, A. M. Novak, F. Nothaft, L. Pachter, D. Patterson, M. Smuga-Otto, J. M. Stuard, L. Van’t Veer, B. Wold, and D. Haussler. 2015. The NIH BD2K center for big data in translational genomics. J. Am. Med. Inform. Assoc. 22, 6 (Nov. 2015), 1143--1147.Google ScholarGoogle ScholarCross RefCross Ref
  39. Paul Menage. 2006. Linux Kernel cgroups Documentation. The Linux Kernel Archives: cgroups features, including cpusets and memory controller. http://www.kernel.org/doc/Documentation/cgroups/.Google ScholarGoogle Scholar
  40. Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Milan Vojnovic, and Sriram Rao. 2016. Efficient queue management for cluster scheduling. In Proceedings of the 11th European Conference on Computer Systems. ACM, 36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC’12). ACM, New York, NY, Article 7, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys’13). ACM, New York, NY, 351--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yannis Sfakianakis, Stelios Mavridis, Anastasios Papagiannis, Spyridon Papageorgiou, Markos Fountoulakis, Manolis Marazakis, and Angelos Bilas. 2014. Vanguard: Increasing server efficiency via workload isolation in the storage I/O path. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’14). ACM, New York, NY, Article 19, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. 2011. CloudScale: Elastic resource scaling for multi-tenant cloud systems. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC’11). ACM, New York, NY, Article 5, 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. Torralba, R. Fergus, and W. T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 11 (Nov. 2008), 1958--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O’Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC’13). ACM, New York, NY, Article 5, 16 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient performance prediction for large-scale advanced analytics. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI’16). USENIX Association, Santa Clara, CA, 363--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Abhishek Verma, Ludmila Cherkasova, and Roy H. Campbell. 2011. ARIA: Automatic resource inference and allocation for mapreduce environments. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC’11). ACM, New York, NY, 235--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 607--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 607--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. 2013. MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’13). 55--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. 2017. Live video analytics at scale with approximation and delay-tolerance. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, 377--392. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhang. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU performance isolation for shared compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys’13). ACM, New York, NY, 379--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 406--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Z. Zhang, K. Barbary, F. A. Nothaft, E. Sparks, O. Zahn, M. J. Franklin, D. A. Patterson, and S. Perlmutter. 2015. Scientific computing meets big data technology: An astronomy use case. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data’15). 918--927. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. QuMan: Profile-based Improvement of Cluster Utilization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 15, Issue 3
          September 2018
          322 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/3274266
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 August 2018
          • Accepted: 1 April 2018
          • Revised: 1 March 2018
          • Received: 1 September 2017
          Published in taco Volume 15, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format