skip to main content
10.1145/2668930.2688058acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article
Free access

A Constraint Programming Based Hadoop Scheduler for Handling MapReduce Jobs with Deadlines on Clouds

Published: 31 January 2015 Publication History

Abstract

A novel MapReduce constraint programming based matchmaking and scheduling algorithm (MRCP) that can handle MapReduce jobs with deadlines and achieve high system performance is devised. The MRCP algorithm is incorporated into Hadoop, which is a widely used open source implementation of the MapReduce programming model, as a new scheduler called the CP-Scheduler. This paper originates from the collaborative research with our industrial partner concerning the engineering of resource management middleware for high performance. It describes our experiences and the challenges that we encountered in designing and implementing the prototype CP-based Hadoop scheduler. A detailed performance evaluation of the CP-Scheduler is conducted on Amazon EC2 to determine the CP-Scheduler's effectiveness as well as to obtain insights into system behaviour and performance. In addition, the CP-Scheduler's performance is also compared with an earliest deadline first (EDF) Hadoop scheduler, which is implemented by extending Hadoop's default FIFO scheduler. The experimental results demonstrate the effectiveness of the CP-Scheduler's ability to handle an open stream of MapReduce jobs with deadlines in a Hadoop cluster.

References

[1]
The Apache Software Foundation. Hadoop. Available: http://hadoop.apache.org.
[2]
Jones, M. 2011. Scheduling in Hadoop. Available: http://www.ibm.com/developerworks/library/os-hadoop-scheduling/
[3]
Rossi, F., Beek, P., and Walsh, T. 2008. Chapter 4: Constraint Programming. Handbook of Knowledge Representation (2008). 181--211.
[4]
Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. Int'l Symp. on Operating System Design and Implementation (Dec. 2004). 137--150.
[5]
Verma, A., Cherkasova, L., Kumar, V.S., and Campbell, R.H. 2012. Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle. In Proc. of Network Operations and Management Symposium (16--20 April 2012). 900--905.
[6]
Dong, X., Wang, Y., and Liao, H. 2011. Scheduling Mixed Real-Time and Non-real-Time Applications in MapReduce Environment. Int'l Conf. on Parallel and Distributed Systems (7-9 Dec. 2011). 9--16.
[7]
Mattess, M., Calheiros, R.N., and Buyya, R. 2013. Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines. Int'l Conf. on Advanced Information Networking and Applications (25--28 March 2013). 629--636.
[8]
Hwang, E. and Kim, K. H. 2012. Minimizing Cost of Virtual Machines for Deadline-Constrained MapReduce Applications in the Cloud. Int'l Conf. on Grid Computing (20-23 Sept. 2012).130--138.
[9]
Kc, K., and Anyanwu, K. 2010. Scheduling Hadoop Jobs to Meet Deadlines. Int'l Conf. on Cloud Computing Technology and Science (Nov. 30 2010-Dec. 3 2010). 388--392.
[10]
Lim, N., Majumdar, S., and Ashwood-Smith, P. 2014.Engineering Resource Management Middleware for Optimizing the Performance of Clouds Processing MapReduce Jobs with Deadlines. Int'l Conf. on Performance Engineering (Mar. 24-26 2014). 161--172.
[11]
IBM. IBM ILOG CPLEX Optimization Studio V12.5 Reference Manual. Available: http://pic.dhe.ibm.com/ infocenter/cosinfoc/ v12r5/index.jsp
[12]
Lim, N., Majumdar, S., and Ashwood-Smith, P. 2014. A Constraint Programming-Based Resource Management Technique for Processing MapReduce Jobs with SLAs on Clouds. Int'l Conf. on Parallel Processing (Sept 9-12 2014).
[13]
White, T. 2011. Hadoop: The Definitive Guide, 2nd Edition. O'Reilly Media, Inc., Sebastopol, CA, USA.
[14]
Apache. Hadoop Wiki. Available: http://wiki.apache.org/ hadoop/PoweredBy
[15]
Fadika, Z., Dede, E., Hartog, J., and Govindaraju, M. 2012. MARLA: MapReduce for Heterogeneous Clusters. IEEE/ACM Int'l Symp. on Cluster, Cloud and Grid Computing (13-16 May 2012). 49--56.
[16]
Chang, H., Kodialam, M., Kompella, R.R., Lakshman, T.V. Lee, M., and Mukherjee, S. 2011. Scheduling in mapreduce like systems for fast completion time. IEEE INFOCOM (10-15 April 2011). 3074--3082.
[17]
Gao, X., Chen, Q., Chen, Y., Sun, Q., Liu, Y., and Li, M. 2012. A Dispatching-Rule-Based Task Scheduling Policy for MapReduce with Multi-type Jobs in Heterogeneous Environments. ChinaGrid Annual Conference (20-23 Sept. 2012). 17--24.
[18]
IBM. 2010. Detailed Scheduling in IBM ILOG CPLEX Optimization Studio with IBM ILOG CPLEX CP Optimizer. White Paper. IBM Corporation (2010).
[19]
Zujie, R., Wan, J., Shi, W., Xu, X., and Zhou, M. 2014. Workload Analysis, Implications, and Optimization on a Production Hadoop Cluster: A Case Study on Taobao. IEEE Transactions Services Computing (vol.7, no.2, April-June 2014). 307--321.

Cited By

View all
  • (2022)PASSecurity and Communication Networks10.1155/2022/85983052022Online publication date: 1-Jan-2022
  • (2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
  • (2019)Core group placement: allocation and provisioning of heterogeneous resourcesEURO Journal on Computational Optimization10.1007/s13675-018-0095-97:3(243-264)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '15: Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering
January 2015
366 pages
ISBN:9781450332484
DOI:10.1145/2668930
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 January 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. constraint programming.
  2. hadoop scheduler
  3. mapreduce with deadlines
  4. resource management on clouds

Qualifiers

  • Research-article

Conference

ICPE'15
Sponsor:
ICPE'15: ACM/SPEC International Conference on Performance Engineering
January 28 - February 4, 2015
Texas, Austin, USA

Acceptance Rates

ICPE '15 Paper Acceptance Rate 23 of 74 submissions, 31%;
Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)11
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)PASSecurity and Communication Networks10.1155/2022/85983052022Online publication date: 1-Jan-2022
  • (2022)Energy Utilization Task Scheduling for MapReduce in Heterogeneous ClustersIEEE Transactions on Services Computing10.1109/TSC.2020.296669715:2(931-944)Online publication date: 1-Mar-2022
  • (2019)Core group placement: allocation and provisioning of heterogeneous resourcesEURO Journal on Computational Optimization10.1007/s13675-018-0095-97:3(243-264)Online publication date: Sep-2019
  • (2018)POSUM: A Portfolio Scheduler for MapReduce Workloads2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622215(351-357)Online publication date: Dec-2018
  • (2018)Leveraging Cloud Computing and Sensor-Based Devices in the Operation and Management of Smart SystemsHandbook of Smart Cities10.1007/978-3-319-97271-8_3(55-80)Online publication date: 16-Nov-2018
  • (2017)Dynamic deadline-constraint scheduler for Hadoop YARN2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI)10.1109/UIC-ATC.2017.8397643(1-8)Online publication date: Aug-2017
  • (2017)MRCP-RMIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.261732428:5(1375-1389)Online publication date: 1-May-2017
  • (2017)Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) StrategyWireless Personal Communications: An International Journal10.1007/s11277-017-3953-595:3(2709-2733)Online publication date: 1-Aug-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media