ABSTRACT
Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control.
We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.
- Adaptive Computing Enterprises Inc. Maui Scheduler Administrator's Guide, 3.2 ed. Provo, UT, 2011.Google Scholar
- Adl-Tabatabai, A.-R., Lewis, B. T., Menon, V., Murphy, B. R., Saha, B., and Shpeisman, T. Compiler and runtime support for efficient software transactional memory. In Proceedings of PLDI (2006), pp. 26--37. Google ScholarDigital Library
- Ananthanarayanan, G., Douglas, C., Ramakrishnan, R., Rao, S., and Stoica, I. True elasticity in multitenant data-intensive compute clusters. In Proceedings of SoCC (2012), p. 24. Google ScholarDigital Library
- Apache. Hadoop On Demand. http://goo.gl/px8Yd, 2007. Accessed 20/06/2012.Google Scholar
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems 26, 2 (June 2008), 4:1--4:26. Google ScholarDigital Library
- Chen, Y., Alspaugh, S., Borthakur, D., and Katz, R. Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In Proceedings of EuroSys (2012). Google ScholarDigital Library
- Chen, Y., Ganapathi, A. S., Griffith, R., and Katz, R. H. Design insights for MapReduce from diverse production workloads. Tech. Rep. UCB/EECS-2012-17, UC Berkeley, Jan. 2012.Google ScholarCross Ref
- Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. CACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- Engler, D. R., Kaashoek, M. F., and O'Toole, Jr., J. Exokernel: an operating system architecture for application-level resource management. In Proceedings of SOSP (1995), pp. 251--266. Google ScholarDigital Library
- Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and Fonseca, R. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of EuroSys (2012), pp. 99--112. Google ScholarDigital Library
- Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: fair allocation of multiple resource types. In Proceedings of NSDI (2011), pp. 323--336. Google ScholarDigital Library
- Herodotou, H., Dong, F., and Babu, S. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proceedings of SoCC (2011). Google ScholarDigital Library
- Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A., Katz, R., Shenker, S., and Stoica, I. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of NSDI (2011). Google ScholarDigital Library
- Iqbal, S., Gupta, R., and Fang, Y.-C. Planning considerations for job scheduling in HPC clusters. Dell Power Solutions (Feb. 2005).Google Scholar
- Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: fair scheduling for distributed computing clusters. In Proceedings of SOSP (2009). Google ScholarDigital Library
- Jackson, D. and Snell, Q. and Clement, M. Core algorithms of the Maui scheduler. In Job Scheduling Strategies for Parallel Processing. 2001, pp. 87--102. Google ScholarDigital Library
- Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P. An analysis of traces from a production MapReduce cluster. In Proceedings of CCGrid (2010), pp. 94--103. Google ScholarDigital Library
- Kung, H. T., and Robinson, J. T. On optimistic methods for concurrency control. ACM Transactions on Database Systems 6, 2 (June 1981), 213--226. Google ScholarDigital Library
- Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In Proceedings of SIGMOD (2010), pp. 135--146. Google ScholarDigital Library
- Mishra, A. K., Hellerstein, J. L., Cirne, W., and Das, C. R. Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Performance Evaluation Review 37 (Mar. 2010), 34--41. Google ScholarDigital Library
- Murthy, A. C., Douglas, C., Konar, M., O'Malley, O., Radia, S., Agarwal, S., and K V, V. Architecture of next generation Apache Hadoop MapReduce framework. Tech. rep., Apache Hadoop, 2011.Google Scholar
- Pan, H., Hindman, B., and Asanović, K. Lithe: enabling efficient composition of parallel libraries. In Proceedings of HotPar (2009). Google ScholarDigital Library
- Peng, D., and Dabek, F. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of OSDI (2010). Google ScholarDigital Library
- Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of SoCC (2012). Google ScholarDigital Library
- Sharma, B., Chudnovsky, V., Hellerstein, J., Rifaat, R., and Das, C. Modeling and synthesizing task placement constraints in Google compute clusters. In Proceedings of SoCC (2011). Google ScholarDigital Library
- Verma, A., Cherkasova, L., and Campbell, R. SLO-driven right-sizing and resource provisioning of MapReduce jobs. In Proceedings of LADIS (2011).Google Scholar
- Wilkes, J. More Google cluster data. Google research blog, Nov. 2011. Posted at http://goo.gl/9B7PA.Google Scholar
- Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys (2010), pp. 265--278. Google ScholarDigital Library
- Zhang, Q., Hellerstein, J., and Boutaba, R. Characterizing task usage shapes in Google's compute clusters. In Proceedings of LADIS (2011).Google Scholar
Index Terms
- Omega: flexible, scalable schedulers for large compute clusters
Recommendations
An extended fine-grained conflict detection method for shared-state scheduling in large scale cluster
ICIIP '16: Proceedings of the 1st International Conference on Intelligent Information ProcessingNowadays, shared-state scheduling architecturehas been paid more attention in the large-scale cluster scheduling area. As a crucial part of shared-state scheduling architecture, optimistic concurrency control (OCC) algorithm has been studied by database ...
Improving transaction abort rates without compromising throughput through judicious scheduling
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied ComputingAlthought optimistic concurrency control protocols have increasingly been used in distributed database management systems, they imply a trade-off between the number of transactions that can be executed concurrently, hence, the peak throughput, and ...
Infinite Resources for Optimistic Concurrency Control
NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network ComputingOptimistic concurrency control (OCC) is inefficient for high-contention workloads. When concurrent transactions conflict, an OCC system wastes CPU resources verifying transactions, only to abort them. This paper presents a new system, called Network ...
Comments