research-article

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates

Authors:
Pamela Delgado

EPFL, Switzerland

EPFL, Switzerland
View Profile

,
Diego Didona

EPFL, Switzerland

EPFL, Switzerland
View Profile

,
Florin Dinu

EPFL, Switzerland and University of Sydney, Australia

EPFL, Switzerland and University of Sydney, Australia
View Profile

,
Willy Zwaenepoel

EPFL, Switzerland and University of Sydney, Australia

EPFL, Switzerland and University of Sydney, Australia
View Profile

SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingOctober 2018Pages 135–148https://doi.org/10.1145/3267809.3267838

Published:11 October 2018Publication History

SoCC '18: Proceedings of the ACM Symposium on Cloud Computing

Pages 135–148

ABSTRACT

The vast majority of data center schedulers use task runtime estimates to improve the quality of their scheduling decisions. Knowledge about runtimes allows the schedulers, among other things, to achieve better load balance and to avoid head-of-line blocking. Obtaining accurate runtime estimates is, however, far from trivial, and erroneous estimates lead to sub-optimal scheduling decisions. Techniques to mitigate the effect of inaccurate estimates have shown some success, but the fundamental problem remains.

This paper presents Kairos, a novel data center scheduler that assumes no prior information on task runtimes. Kairos introduces a distributed approximation of the Least Attained Service (LAS) scheduling policy. Kairos consists of a centralized scheduler and per-node schedulers. The per-node schedulers implement LAS for tasks on their node, using preemption as necessary to avoid head-of-line blocking. The centralized scheduler distributes tasks among nodes in a manner that balances the load and imposes on each node a workload in which LAS provides favorable performance.

We have implemented Kairos in YARN. We compare its performance against the YARN FIFO scheduler and Big-C, an open-source state-of-the-art YARN-based scheduler that also uses preemption. Compared to YARN FIFO, Kairos reduces the median job completion time by 73% and the 99th percentile by 30%. Compared to Big-C, the improvements are 37% for the median and 57% for the 99th percentile. We evaluate Kairos at scale by implementing it in the Eagle simulator and comparing its performance against Eagle. Kairos improves the 99th percentile of short job completion times by up to 55% for the Google trace and 85% for the Yahoo trace.

References

O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation, NSDI'17, pages 469--482, Berkeley, CA, USA, 2017. USENIX Association. Google ScholarDigital Library
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Effective straggler mitigation: Attack of the clones. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 185--198, Berkeley, CA, USA, 2013. USENIX Association. Google ScholarDigital Library
N. Avrahami and Y. Azar. Minimizing total flow time and total completion time with immediate dispatching. In Proceedings of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA'03, pages 11--18, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI'14, pages 285--300, Broomfield, CO, Oct. 2014. USENIX Association. Google ScholarDigital Library
W. Chen, J. Rao, and X. Zhou. Preemptive, low latency datacenter scheduling via lightweight virtualization. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC'17, pages 251--263, Santa Clara, CA, 2017. Google ScholarDigital Library
Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proceedings of the VLDB Endowment, 5(12):1802--1813, Aug. 2012. Google ScholarDigital Library
Y. Chen, A. Ganapathi, R. Griffith, and R. Katz. The case for evaluating mapreduce performance using workload suites. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS'11, pages 390--399, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
E. Coppa and I. Finocchi. On data skewness, stragglers, and mapreduce progress indicators. In Proceedings of the 6th ACM Symposium on Cloud Computing, SoCC'15, pages 139--152, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan, and S. Rao. Reservation-based scheduling: If you're late don't blame us! In Proceedings of the ACM Symposium on Cloud Computing, SOCC'14, pages 2:1--2:14, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
P. Delgado, D. Didona, F. Dinu, and W. Zwaenepoel. Job-aware scheduling in eagle: Divide and stick to your probes. In Proceedings of the Seventh ACM Symposium on Cloud Computing, number EPFL-CONF-221125, 2016. Google ScholarDigital Library
P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel. Hawk: Hybrid datacenter scheduling. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC'15, pages 499--510. USENIX Association, July 2015. Google ScholarDigital Library
C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'14, pages 127--144, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
D. G. Down and R. Wu. Multi-layered round robin routing for parallel servers. Queueing Systems, 53(4):177--188, Aug. 2006. Google ScholarDigital Library
A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed job latency in data parallel clusters. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys'12, pages 99--112, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
P. Garefalakis, K. Karanasos, P. Pietzuch, A. Suresh, and S. Rao. Medea: Scheduling of long running applications in shared production clusters. In Proceedings of the 13th European Conference on Computer Systems, EuroSys'18, pages 4:1--4:13, 2018. Google ScholarDigital Library
B. Ghit and D. H. J. Epema. Tyrex: Size-based resource allocation in mapreduce frameworks. In IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colombia, May 16-19, 2016, pages 11--20, 2016.Google ScholarDigital Library
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 323--336, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
I. Gog, M. Schwarzkopf, A. Gleave, R. M. N. Watson, and S. Hand. Firmament: Fast, centralized cluster scheduling at scale. In Proc. of OSDI, 2016. Google ScholarDigital Library
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review, 44(4):455--466, Aug. 2014. Google ScholarDigital Library
R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, pages 65--80, Berkeley, CA, USA, 2016. USENIX Association. Google ScholarDigital Library
R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni. GRAPHENE: Packing and dependency-aware scheduling for data-parallel clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, pages 81--97, Savannah, GA, 2016. USENIX Association. Google ScholarDigital Library
Z. Hu, B. Li, Z. Qin, and R. S. M. Goh. Job scheduling without prior information in big data processing systems. In Proceedings of ICDCS, 2017.Google ScholarCross Ref
C.-C. Hung, L. Golubchik, and M. Yu. Scheduling jobs across geo-distributed datacenters. In Proceedings of the 6th Symposium on Cloud Computing, SoCC'15, pages 111--124, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP'09, pages 261--276, 2009. Google ScholarDigital Library
K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC'15, pages 485--497, Berkeley, CA, USA, 2015. USENIX Association. Google ScholarDigital Library
Y. Kwon, M. Balazinska, B. Howe, and J. A. Rolia. Skewtune: mitigating skew in mapreduce applications. In K. S. Candan, Y. C. 0001, R. T. Snodgrass, L. Gravano, and A. Fuxman, editors, SIGMOD Conference, pages 25--36. ACM, 2012. Google ScholarDigital Library
M. Nuyens and A. Wierman. The foreground-background queue: A survey. Performance evaluation, 65(3-4):286--307, Mar. 2008. Google ScholarDigital Library
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP'13, pages 69--84, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
J. W. Park, A. Tumanov, A. Jiang, M. A. Kozuch, and G. R. Ganger. 3sigma: distribution-based cluster scheduling for runtime uncertainty. In Proceedings of the 13th European Conference on Computer Systems, EuroSys'18, page 2. ACM, 2018. Google ScholarDigital Library
J. Rasley, K. Karanasos, S. Kandula, R. Fonseca, M. Vojnovic, and S. Rao. Efficient queue management for cluster scheduling. In Proceedings of the 11th European Conference on Computer Systems, EuroSys'16, page 36. ACM, 2016. Google ScholarDigital Library
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the 3rd ACM Symposium on Cloud Computing, SoCC'12, pages 7:1--7:13, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
L. Schrage. A proof of the optimality of the shortest remaining processing time discipline. Operations Research, 16(3):687--690, 1968.Google ScholarDigital Library
L. E. Schrage and L. W. Miller. The queue m/g/1 with the shortest remaining processing time discipline. Operations Research, 14(4):670--684, Aug. 1966. Google ScholarDigital Library
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th European Conference on Computer Systems, EuroSys'13, pages 351--364, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
A. Tumanov, T. Zhu, J. W. Park, M. A. Kozuch, M. Harchol-Balter, and G. R. Ganger. Tetrisched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the 11th European Conference on Computer Systems, EuroSys'16, page 35. ACM, 2016. Google ScholarDigital Library
Y. Zhang, G. Prekas, G. M. Fumarola, M. Fontoura, I. n. Goiri, and R. Bianchini. History-based harvesting of spare cycles and storage in large-scale datacenters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, pages 755--770, Berkeley, CA, USA, 2016. USENIX Association. Google ScholarDigital Library

Index Terms

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes
SoCC '16: Proceedings of the Seventh ACM Symposium on Cloud Computing

We present Eagle, a new hybrid data center scheduler for data-parallel programs. Eagle dynamically divides the nodes of the data center in partitions for the execution of long and short jobs, thereby avoiding head-of-line blocking. Furthermore, it ...
Read More
Forget the Deadline: Scheduling Interactive Applications in Data Centers
CLOUD '15: Proceedings of the 2015 IEEE 8th International Conference on Cloud Computing

Many interactive applications running in data centers such as web search, social networks, online gaming, and financial services are delay-sensitive, and often have a deadline. These deadlines vary across users and applications, which makes the job ...
Read More
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoCC '18: Proceedings of the ACM Symposium on Cloud Computing
October 2018
546 pages
ISBN:9781450360111
DOI:10.1145/3267809

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cloud computing
Data center
Scheduling
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 557
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates

SoCC '18: Proceedings of the ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Forget the Deadline: Scheduling Interactive Applications in Data Centers

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates

SoCC '18: Proceedings of the ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Job-aware Scheduling in Eagle: Divide and Stick to Your Probes

Forget the Deadline: Scheduling Interactive Applications in Data Centers

Scheduling of deteriorating jobs with release dates to minimize the maximum lateness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media