ABSTRACT
Modern data analytics systems use long-running executors to run an application's entire DAG. Executors exhibit salient time-varying resource requirements. Yet, existing schedulers simply reserve resources for executors statically, and use the peak resource demand to guide executor placement. This leads to low utilization and poor application performance.
We present Elasecutor, a novel executor scheduler for data analytics systems. Elasecutor dynamically allocates and explicitly sizes resources to executors over time according to the predicted time-varying resource demands. Rather than placing executors using their peak demand, Elasecutor strategically assigns them to machines based on a concept called dominant remaining resource to minimize resource fragmentation. Elasecutor further adaptively reprovisions resources in order to tolerate inaccurate demand prediction. Testbed evaluation on a 35-node cluster with our Spark-based prototype implementation shows that Elasecutor reduces makespan by more than 42% on average, reduces median application completion time by up to 40%, and improves cluster utilization by up to 55% compared to existing work.
- Apache Flink. http://flink.apache.org.Google Scholar
- Apache Hadoop. http://hadoop.apache.org.Google Scholar
- Apache Spark. https://spark.apache.org.Google Scholar
- Apache Storm. http://storm.apache.org.Google Scholar
- Apache Tez. http://tez.apache.org.Google Scholar
- Capacity Scheduler. http://bit.ly/1tGpbDN.Google Scholar
- Cluster Mode Overview. https://spark.apache.org/docs/2.1.0/cluster-overview.html.Google Scholar
- Elasecutor. https://github.com/NetX-lab/Elasecutor.Google Scholar
- Fair Scheduler. https://spark.apache.org/docs/2.1.0/job-scheduling.html#fair-scheduler-pools.Google Scholar
- HiBench. https://github.com/intel-hadoop/HiBench.Google Scholar
- How-to: Tune Your Apache Spark Jobs. http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2.Google Scholar
- Kubernetes. https://kubernetes.io/.Google Scholar
- OpenJDK. http://openjdk.java.net.Google Scholar
- Resource Allocation Policy in Spark 2.1.0. https://spark.apache.org/docs/2.1.0/job-scheduling.html#resource-allocation-policy.Google Scholar
- S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing Data Parallel Computing. In Proc. USENIX NSDI, 2012. Google ScholarDigital Library
- M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei. Remote Memory in the Age of Fast Networks. In Proc. ACM SoCC, 2017. Google ScholarDigital Library
- O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In Proc. USENIX NSDI, 2017. Google ScholarDigital Library
- J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for Hyper-Parameter Optimization. In Proc. NIPS, 2011. Google ScholarDigital Library
- E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In Proc. USENIX OSDI, 2014. Google ScholarDigital Library
- M. Chowdhury and I. Stoica. Efficient Coflow Scheduling Without Prior Knowledge. In Proc. ACM SIGCOMM, 2015. Google ScholarDigital Library
- C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan, and S. Rao. Reservation-based Scheduling: If You're Late Don't Blame Us! In Proc. ACM SoCC, 2014. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proc. OSDI, 2004. Google ScholarDigital Library
- C. Delimitrou and C. Kozyrakis. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proc. ACM ASPLOS, 2013. Google ScholarDigital Library
- C. Delimitrou and C. Kozyrakis. Quasar: Resource-Efficient and QoS-Aware Cluster Management. In Proc. ACM ASPLOS, 2014. Google ScholarDigital Library
- H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support Vector Regression Machines. In Proc. NIPS, 1996. Google ScholarDigital Library
- A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In Proc. ACM EuroSys, 2012. Google ScholarDigital Library
- P. Garefalakis, K. Karanasos, P. Pietzuch, A. Suresh, and S. Rao. Medea: Scheduling of Long Running Applications in Shared Production Clusters. In Proc. ACM EuroSys, 2018. Google ScholarDigital Library
- M. R. Garey, R. L. Graham, and J. D. Ullman. Worst-case Analysis of Memory Allocation Algorithms. In Proc. ACM STOC, 1972. Google ScholarDigital Library
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. In Proc. USENIX NSDI, 2011. Google ScholarDigital Library
- Z. Gong, X. Gu, and J. Wilkes. PRESS: PRedictive Elastic ReSource Scaling for cloud systems. In Proc. IEEE CNSM, 2010.Google Scholar
- R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella. Multi-Resource Packing for Cluster Schedulers. In Proc. ACM SIGCOMM, 2014. Google ScholarDigital Library
- R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan. Altruistic Scheduling in Multi-Resource Clusters. In Proc. USENIX OSDI, 2016. Google ScholarDigital Library
- R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni. Graphene: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In Proc. USENIX OSDI, 2016. Google ScholarDigital Library
- M. Grzegorz, H. A. Matthew, J. C. B. Aart, C. D. James, H. Ilan, L. Naty, and C. Grzegorz. Pregel: A System for Large-Scale Graph Processing. In Proc. ACM SIGMOD, 2010. Google ScholarDigital Library
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Proc. USENIX NSDI, 2011. Google ScholarDigital Library
- C. Iorgulescu, F. Dinu, A. Raza, W. U. Hassan, and W. Zwaenepoel. Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling. In Proc. USENIX ATC, 2017. Google ScholarDigital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. In Proc. ACM EuroSys, 2007. Google ScholarDigital Library
- M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proc. ACM SOSP, 2009. Google ScholarDigital Library
- E. G. Joseph, S. X. Reynold, D. Ankur, C. Daniel, J. F. Michael, and S. Ion. GraphX: Graph Processing in a Distributed Dataflow Framework. In Proc. USENIX OSDI, 2014. Google ScholarDigital Library
- K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters. In Proc. USENIX ATC, 2015. Google ScholarDigital Library
- S. Kim, S. Huh, Y. Hu, X. Zhang, E. Witchel, A. Wated, and M. Silberstein. GPUnet: Networking Abstractions for GPU Programs. In Proc. USENIX OSDI, 2014. Google ScholarDigital Library
- R. Kondor and T. Jebara. A Kernel Between Sets of Vectors. In Proc. ICML, 2003. Google ScholarDigital Library
- T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, and S. Shenker. Onix: A distributed control platform for large-scale production networks. In Proc. OSDI, 2010. Google ScholarDigital Library
- M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A Modern, Open-Source SQL Engine for Hadoop. In Proc. CIDR, 2015.Google Scholar
- M. Korupolu, A. Singh, and B. Bamba. Coupled placement in modern data centers. In Proc. IPTPS, 2009.Google ScholarDigital Library
- A. Kuzmanovska, R. H. Mak, and D. Epema. KOALA-F: A Resource Manager for Scheduling Frameworks in Clusters. In Proc. IEEE/ACM CCGrid, 2016.Google ScholarDigital Library
- B. Li, K. Tan, L. L. Luo, Y. Peng, R. Luo, N. Xu, Y. Xiong, P. Cheng, and E. Chen. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware. In Proc. ACM SIGCOMM, 2016. Google ScholarDigital Library
- Y. Lu, A. Chowdhery, and S. Kandula. Optasia: A Relational Platform for Efficient Large-Scale Video Analytics. In Proc. ACM SoCC, 2016. Google ScholarDigital Library
- H. Mao, M. Alizadeh, I. Menache, and S. Kandula. Resource Management with Deep Reinforcement Learning. In Proc. ACM HotNets, 2016. Google ScholarDigital Library
- Z. Matei, D. Tathagata, L. Haoyuan, H. Timothy, S. Scott, and S. Ion. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In Proc. ACM SOSP, 2013. Google ScholarDigital Library
- C. Michele, R. Yan, and L. Zheng. Adaptive Kernel Approximation for Large-Scale Non-Linear SVM Prediction. In Proc. ICML, 2011. Google ScholarDigital Library
- K. Morton, M. Balazinska, and D. Grossman. ParaTimer: A Progress Indicator for MapReduce DAGs. In Proc. ACM SIGMOD, 2010. Google ScholarDigital Library
- H. Nguyen, Z. Shen, X. Gu, S. Subbiah, and J. Wilkes. AGILE: Elastic Distributed Resource Scaling for Infrastructure-as-a-Service. In Proc. USENIX ICAC, 2013.Google Scholar
- K. Ousterhout, C. Canel, S. Ratnasamy, and S. Shenker. Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks. In Proc. ACM SOSP, 2017. Google ScholarDigital Library
- K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun. Making Sense of Performance in Data Analytics Frameworks. In Proc. USENIX NSDI, 2015. Google ScholarDigital Library
- K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, Low Latency Scheduling. In Proc. ACM SOSP, 2013. Google ScholarDigital Library
- J. Rasley, K. Karanasos, S. Kandula, R. Fonseca, M. Vojnovic, and S. Rao. Efficient queue management for cluster scheduling. In Proc. ACM EuroSys, 2016. Google ScholarDigital Library
- A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren. Inside the Social Network's (Datacenter) Network. In Proc. ACM SIGCOMM, 2015. Google ScholarDigital Library
- B. Schoikopr, P. Bartlett, A. Smola, and R. Williamson. Shrinking the Thbe: A New Support Vector Regression Algorithm. In Proc. NIPS, 1999. Google ScholarDigital Library
- M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. In Proc. EuroSys, 2013. Google ScholarDigital Library
- Z. Shen, S. Subbiah, X. Gu, and J. Wilkes. CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems. In Proc. ACM SoCC, 2011. Google ScholarDigital Library
- A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In Proc. ACM SIGCOMM, 2015. Google ScholarDigital Library
- A. J. Smola and B. Schölkopf. A Tutorial on Support Vector Regression. http://www.svms.org/regression/SmSc98.pdf, Technical Report, 1998.Google Scholar
- J. Son, Y. Xiong, K. Tan, P. Wang, Z. Gan, and S. Moon. Protego: Cloud-Scale Multitenant IPsec Gateway. In Proc. USENIX ATC, 2017. Google ScholarDigital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@Twitter. In Proc. ACM SIGMOD, 2014. Google ScholarDigital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Morpheus: Towards Automated SLOs for Enterprise Clusters. In Proc. USENIX OSDI, 2016. Google ScholarDigital Library
- V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache Hadoop YARN: Yet Another Resource Negotiator. In Proc. ACM SoCC, 2013. Google ScholarDigital Library
- S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In Proc. USENIX NSDI, 2016. Google ScholarDigital Library
- A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale Cluster Management at Google with Borg. In Proc. ACM EuroSys, 2015. Google ScholarDigital Library
- J. Wang and M. Balazinska. Elastic Memory Management for Cloud Data Analytics. In Proc. USENIX ATC, 2017. Google ScholarDigital Library
- M. Weimer, Y. Chen, B.-G. Chun, T. Condie, C. Curino, C. Douglas, Y. Lee, T. Majestro, D. Malkhi, S. Matusevych, B. Myers, S. Narayanamurthy, R. Ramakrishnan, S. Rao, R. Sears, B. Sezgin, and J. Wang. REEF: Retainable Evaluator Execution Framework. In Proc. ACM SIGMOD, 2015. Google ScholarDigital Library
- G. J. Woeginger. There is no asymptotic PTAS for two-dimensional vector packing. Information Processing Letters, 64:293--297, June 1997. Google ScholarDigital Library
- W. Xia, H. Jiang, D. Feng, and Y. Hua. SiLo: A Similarity-Locality based Near-Exact Deduplication Scheme with Low RAM Overhead and High Throughput. In Proc. USENIX ATC, 2011. Google ScholarDigital Library
- D. Xie, N. Ding, Y. C. Hu, and R. Kompella. The Only Constant is Change: Incorporating Time-Varying Network Reservations in Data Centers. In Proc. ACM SIGCOMM, 2012. Google ScholarDigital Library
- G. Xu and C.-Z. Xu. Prometheus: Online Estimation of Optimal Memory Demands for Workers in In-memory Distributed Computation. In Proc. ACM SoCC poster, 2017. Google ScholarDigital Library
- G. Xu, C.-Z. Xu, and S. Jiang. Prophet: Scheduling Executors with Time-varying Resource Demands on Data-Parallel Computation Frameworks. In Proc. USENIX ICAC, 2016.Google ScholarCross Ref
- Y. Yan, Y. Gao, Y. Chen, Z. Guo, B. Chen, and T. Moscibroda. TR-Spark: Transient Computing for Big Data Analytics. In Proc. ACM SoCC, 2016. Google ScholarDigital Library
- M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In Proc. ACM EuroSys, 2010. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proc. USENIX NSDI, 2012. Google ScholarDigital Library
- H. Zhang, L. Stafman, A. Or, and M. J. Freedman. SLAQ: Quality-Driven Scheduling for Distributed Machine Learning. In Proc. ACM SoCC, 2017. Google ScholarDigital Library
Index Terms
- Elasecutor: Elastic Executor Scheduling in Data Analytics Systems
Recommendations
Elasecutor: Elastic Executor Scheduling in Data Analytics Systems
Modern data analytics systems use long-running executors to run an application’s entire DAG. Executors exhibit salient time-varying resource requirements. Yet, existing schedulers simply reserve resources for executors statically, and use the peak ...
Algorithms for implementing elastic tasks on multiprocessor platforms: a comparative evaluation
AbstractThe elastic task model enables the adaptation of systems of recurrent real-time tasks under uncertain or potentially overloaded conditions. A range of permissible periods is specified for each task in this model; during run-time a period is ...
Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams
Online scheduling plays a key role for big data streaming applications in a big data stream computing environment, as the arrival rate of high-velocity continuous data stream might fluctuate over time. In this paper, an elastic online scheduling ...
Comments