research-article

Omega: flexible, scalable schedulers for large compute clusters

Authors:
Malte Schwarzkopf

University of Cambridge and Google, Inc.

University of Cambridge and Google, Inc.
View Profile

,
Andy Konwinski

University of California, Berkeley and Google, Inc.

University of California, Berkeley and Google, Inc.
View Profile

,
Michael Abd-El-Malek

Google, Inc.

Google, Inc.
View Profile

,
John Wilkes

Google, Inc.

Google, Inc.
View Profile

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer SystemsApril 2013Pages 351–364https://doi.org/10.1145/2465351.2465386

Published:15 April 2013Publication History

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

Pages 351–364

ABSTRACT

Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control.

We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.

References

Adaptive Computing Enterprises Inc. Maui Scheduler Administrator's Guide, 3.2 ed. Provo, UT, 2011.Google Scholar
Adl-Tabatabai, A.-R., Lewis, B. T., Menon, V., Murphy, B. R., Saha, B., and Shpeisman, T. Compiler and runtime support for efficient software transactional memory. In Proceedings of PLDI (2006), pp. 26--37. Google ScholarDigital Library
Ananthanarayanan, G., Douglas, C., Ramakrishnan, R., Rao, S., and Stoica, I. True elasticity in multitenant data-intensive compute clusters. In Proceedings of SoCC (2012), p. 24. Google ScholarDigital Library
Apache. Hadoop On Demand. http://goo.gl/px8Yd, 2007. Accessed 20/06/2012.Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems 26, 2 (June 2008), 4:1--4:26. Google ScholarDigital Library
Chen, Y., Alspaugh, S., Borthakur, D., and Katz, R. Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In Proceedings of EuroSys (2012). Google ScholarDigital Library
Chen, Y., Ganapathi, A. S., Griffith, R., and Katz, R. H. Design insights for MapReduce from diverse production workloads. Tech. Rep. UCB/EECS-2012-17, UC Berkeley, Jan. 2012.Google ScholarCross Ref
Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. CACM 51, 1 (2008), 107--113. Google ScholarDigital Library
Engler, D. R., Kaashoek, M. F., and O'Toole, Jr., J. Exokernel: an operating system architecture for application-level resource management. In Proceedings of SOSP (1995), pp. 251--266. Google ScholarDigital Library
Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and Fonseca, R. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of EuroSys (2012), pp. 99--112. Google ScholarDigital Library
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: fair allocation of multiple resource types. In Proceedings of NSDI (2011), pp. 323--336. Google ScholarDigital Library
Herodotou, H., Dong, F., and Babu, S. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proceedings of SoCC (2011). Google ScholarDigital Library
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A., Katz, R., Shenker, S., and Stoica, I. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of NSDI (2011). Google ScholarDigital Library
Iqbal, S., Gupta, R., and Fang, Y.-C. Planning considerations for job scheduling in HPC clusters. Dell Power Solutions (Feb. 2005).Google Scholar
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: fair scheduling for distributed computing clusters. In Proceedings of SOSP (2009). Google ScholarDigital Library
Jackson, D. and Snell, Q. and Clement, M. Core algorithms of the Maui scheduler. In Job Scheduling Strategies for Parallel Processing. 2001, pp. 87--102. Google ScholarDigital Library
Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P. An analysis of traces from a production MapReduce cluster. In Proceedings of CCGrid (2010), pp. 94--103. Google ScholarDigital Library
Kung, H. T., and Robinson, J. T. On optimistic methods for concurrency control. ACM Transactions on Database Systems 6, 2 (June 1981), 213--226. Google ScholarDigital Library
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In Proceedings of SIGMOD (2010), pp. 135--146. Google ScholarDigital Library
Mishra, A. K., Hellerstein, J. L., Cirne, W., and Das, C. R. Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Performance Evaluation Review 37 (Mar. 2010), 34--41. Google ScholarDigital Library
Murthy, A. C., Douglas, C., Konar, M., O'Malley, O., Radia, S., Agarwal, S., and K V, V. Architecture of next generation Apache Hadoop MapReduce framework. Tech. rep., Apache Hadoop, 2011.Google Scholar
Pan, H., Hindman, B., and Asanović, K. Lithe: enabling efficient composition of parallel libraries. In Proceedings of HotPar (2009). Google ScholarDigital Library
Peng, D., and Dabek, F. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of OSDI (2010). Google ScholarDigital Library
Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of SoCC (2012). Google ScholarDigital Library
Sharma, B., Chudnovsky, V., Hellerstein, J., Rifaat, R., and Das, C. Modeling and synthesizing task placement constraints in Google compute clusters. In Proceedings of SoCC (2011). Google ScholarDigital Library
Verma, A., Cherkasova, L., and Campbell, R. SLO-driven right-sizing and resource provisioning of MapReduce jobs. In Proceedings of LADIS (2011).Google Scholar
Wilkes, J. More Google cluster data. Google research blog, Nov. 2011. Posted at http://goo.gl/9B7PA.Google Scholar
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys (2010), pp. 265--278. Google ScholarDigital Library
Zhang, Q., Hellerstein, J., and Boutaba, R. Characterizing task usage shapes in Google's compute clusters. In Proceedings of LADIS (2011).Google Scholar

Index Terms

Omega: flexible, scalable schedulers for large compute clusters
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. System management
        Centralization / decentralization
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

An extended fine-grained conflict detection method for shared-state scheduling in large scale cluster
ICIIP '16: Proceedings of the 1st International Conference on Intelligent Information Processing

Nowadays, shared-state scheduling architecturehas been paid more attention in the large-scale cluster scheduling area. As a crucial part of shared-state scheduling architecture, optimistic concurrency control (OCC) algorithm has been studied by database ...
Read More
Improving transaction abort rates without compromising throughput through judicious scheduling
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

Althought optimistic concurrency control protocols have increasingly been used in distributed database management systems, they imply a trade-off between the number of transactions that can be executed concurrently, hence, the peak throughput, and ...
Read More
Infinite Resources for Optimistic Concurrency Control
NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network Computing

Optimistic concurrency control (OCC) is inefficient for high-contention workloads. When concurrent transactions conflict, an OCC system wastes CPU resources verifying transactions, only to abort them. This paper presents a new system, called Network ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
April 2013
401 pages
ISBN:9781450319942
DOI:10.1145/2465351
General Chairs:
Zdenek Hanzálek
Czech Technical University Prague
,
Hermann Härtig
Technische Universität Dresden
,
Program Chairs:
Miguel Castro
Microsoft Research Cambridge
,
M. Frans Kaashoek
MIT
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster scheduling
optimistic concurrency control
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 482
  Total Citations
  View Citations
- 3,355
  Total Downloads
- Downloads (Last 12 months)192
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Omega: flexible, scalable schedulers for large compute clusters

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

An extended fine-grained conflict detection method for shared-state scheduling in large scale cluster

Improving transaction abort rates without compromising throughput through judicious scheduling

Infinite Resources for Optimistic Concurrency Control

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Omega: flexible, scalable schedulers for large compute clusters

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

An extended fine-grained conflict detection method for shared-state scheduling in large scale cluster

Improving transaction abort rates without compromising throughput through judicious scheduling

Infinite Resources for Optimistic Concurrency Control

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media