Article

Coscheduling in Clusters: Is It a Viable Alternative?

Authors:

Chita R. DasAuthors Info & Claims

SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing

Page 16

https://doi.org/10.1109/SC.2004.20

Published: 06 November 2004 Publication History

Abstract

In this paper, we conduct an in-depth evaluation of a broad spectrum of scheduling alternatives for clusters. These include the widely used batch scheduling, local scheduling, gang scheduling, all prior communication-driven coscheduling algorithms (Dynamic Coscheduling (DCS), Spin Block (SB), Periodic Boost (PB), and Co-ordinated Coscheduling (CC)) and a newly proposed HYBRID coscheduling algorithm on a 16-node, Myrinet-connected Linux cluster. Performance and energy measurements using several NAS, LLNL and ANL benchmarks on the Linux cluster provide several interesting conclusions. First, although batch scheduling is currently used in most clusters, all blocking-based coscheduling techniques such as SB, CC and HYBRID and the gang scheduling can provide much better performance even in a dedicated cluster platform. Second, in contrast to some of the prior studies, we observe that blocking-based schemes like SB and HYBRID can provide better performance than spin-based techniques like PB on a Linux platform. Third, the proposed HYBRID scheduling provides the best performance-energy behavior and can be implemented on any cluster with little effort. All these results suggest that blocking-based coscheduling techniques are viable candidates to be used in clusters for significant performance-energy benefits.

References

[1]

{1} Open PBS. Available from http://www.openpbs.org.

[2]

{2} A. Acharya and S. Setia. Availability and Utility of Idle Memory in Workstation Clusters. In Proc. of ACM SIGMETRICS'99, pages 35-46, June 1999.

Digital Library

[3]

{3} S. Agarwal, G. Choi, C. R. Das, A. B. Yoo, and S. Nagar. Co-ordinated Coscheduling in time-sharing Clusters through a Generic Framework. In Proceedings of International Conference on Cluster Computing, December 2003.

[4]

{4} T. E. Anderson, D. E. Culler, and D. A. Patterson. A Case for NOW (Networks of Workstations). IEEE Micro, 15(1):54-64, February 1995.

Digital Library

[5]

{5} C. Anglano. A Comparative Evaluation of Implicit Coscheduling Strategies for Networks of Workstations. In Proceedings of 9th International Symposium on High Performance Distributed Computing (HPDC'9), pages 221-228, August 2000.

Digital Library

[6]

{6} A. C. Arpaci-Dusseau, D. E. Culler, and A. M. Mainwaring. Scheduling With Implicit Information in Distributed Systems. In Proceedings of the 1998 ACM SIGMETRICS joint International Conference on Measurement and Modeling of Computer Systems, pages 233-243, June 1998.

Digital Library

[7]

{7} A. M. Bailey. Accelerated Strategic Computing Initiative (ASCI) : Driving the Need for the Terascale Simulation Facility (TSF). In Proceedings of Energy2002 Workshop and Exposition, June 2002.

[8]

{8} A. Batat and D. G. Feitelson. Gang Scheduling with Memory Considerations. In Proceedings in 14th International Parallel and Distributed Processing Symposium, pages 109-114, May 2000.

Digital Library

[9]

{9} N. J. Boden et al. Myrinet: A Gigabit-per-second Local Area Network. IEEE Micro, 15(1):29-36, February 1995.

Digital Library

[10]

{10} D. P. Bovet and M. Cesati. Understanding the Linux Kernel. O'Reilly & Associates, Inc., October 2000.

Digital Library

[11]

{11} T. D. Burd and R. W. Brodersen. Design Issues for Dynamic Voltage Scaling. In Proceedings of the 2000 international symposium on Low power electronics and design, pages 9-14, July 2000.

Digital Library

[12]

{12} Compag, Intel and Microsoft. Specification for the Virtual Interface Architecture. Available from http://www.viarch.org, 1997.

[13]

{13} T. V. Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: A User-level Network Interface of Parallel and Distributed Computing. In Proc. of 15th SOSP, pages 40-53, Dec 1995.

Digital Library

[14]

{14} Y. Etsion and D. G. Feitelson. User-Level Communication in a System with Gang Scheduling. In In Proceedings of the International Parallel and Distributed Processing Symposium, 2001.

Digital Library

[15]

{15} D. G. Feitelson. A Survey of Scheduling in Multiprogrammed Parallel Systems. Technical Report Research Report RC 19790(87657), IBM T. J. Watson Research Center, October 1994.

[16]

{16} D. G. Feitelson and L. Rudolph. Distributed Hierarchical Control for Parallel Processing. IEEE Computer, 23(5):65-77, May 1990.

Digital Library

[17]

{17} Gigabit Ethernet Alliance. 10 Gigabit Ethernet Technology Overview White Paper. Available from http://www.10gea.org/Tech-whitepapers.htm.

[18]

{18} A. Hori, H. Tezuka, and Y. Ishikawa. Highly Efficient Gang Scheduling Implementation. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1-14, 1998.

Digital Library

[19]

{19} IBM Corporation. IBM LoadLeveler. Available from http://www.mppmu.mpg.de/computing/AIXuser/loadl.

[20]

{20} InfiniBand Trade Association. InfiniBand Architecture Specification, Volume 1 & 2, Release 1.1, November 2002. Available from http://www.infinibandta.org.

[21]

{21} Intel and Microsoft. Advanced Power Management v. 1.2. Available from http://www.microsoft.com/.

[22]

{22} Intel, Microsoft and Toshiba. The Advanced Configuration & Power Interface Specification. Available from http://www.acpi.info.

[23]

{23} M. A. Jette. Performance Characteristics of Gang Scheduling in Multiprogrammed Environments. In Proceedings of the 1997 ACM/IEEE conference on Supercomputing, pages 1-12, November 1997.

Digital Library

[24]

{24} M. T. Jones and P. E. Plassmann. Solution of Large, Sparse Systems of Linear Equations in Massively Parallel Applications. In Proceedings of the 1992 ACM/IEEE conference on Supercomputing, pages 551-560, November 1992.

Digital Library

[25]

{25} D. J. Kerbyson, A. Hoisie, and H. J. Wasserman. A Comparison Between the Earth Simulator and Alphaserver Systems Using Predictive Application Performance Models. In Proceeding of the International Parallel and Distributed Processing Symposium 2003, pages 64-73, April 2003.

Digital Library

[26]

{26} Lawreance Livermore National Laboratory. The sPPM Benchmark Code. Available from http://www.llnl.gov/asci/purple/benchmarks/limited/sppm.

[27]

{27} Lawrence Berkeley National Laboratory. Data Center Energy Benchmarking Case Study, July 2003. Available from http://datacenters.lbl.gov/docs/Data_Center_Fac- ility4.pdf.

[28]

{28} Lawrence Livermore National Laboratory. Accelerated Strategic Computing Initiative (ASCI) Program. Available from http://www.llnl.gov/asci.

[29]

{29} B. Lawson, E. Smirni, and D. Puiu. Self-adapting Backfilling Scheduling for Parallel Systems. In Proceedings of the 2002 International Conference on Parallel Processing (ICPP 2002), pages 583-592, August 2002.

Digital Library

[30]

{30} Myrinet, Inc. MPICH-GM software, October 2003. Available from http://www.myrinet.com/.

[31]

{31} Myrinet, Inc. Myrinet GM-1 software, October 2003. Available from http://www.myrinet.com/.

[32]

{32} S. Nagar, A. Banerjee, A. Sivasubramaniam, and C. R. Das. Alternatives to Coscheduling a Network of Workstations. Journal of Parallel and Distributed Computing, 59(2):302-327, November 1999.

Digital Library

[33]

{33} NASA Advanced Supercomputing division. The NAS Parallel Benchmarks (tech report and source code). Available from http://www.nas.nasa.gov/Software/NPB/.

[34]

{34} S. Pakin, M. Lauria, and A. Chien. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 55, December 1995.

Digital Library

[35]

{35} Quadrics Ltd. QsNet HIGH PERFORMANCE INTERCONNECT. Available from http://doc.quadrics.com/quadrics/Quadrics-Home.nsf/DisplayPages/Homepage.

[36]

{36} A. Rubini and J. Corbet. Linux Device Drivers, 2nd Edition. O'Reilly & Associates, Inc., June 2001.

Digital Library

[37]

{37} H. P. Scott Rhine, MSL. Loadable Scheduler Modules on Linux White Paper. Available from http://resourcemanagement.unixsolutions.hp.com.

[38]

{38} S. Setia, M. S. Squillante, and V. K. Naik. The Impact of Job Memory Requirements on Gang-Scheduling Performance. ACM SIGMETRICS Performance Evaluation Review, 26(4):30-39, 1999.

Digital Library

[39]

{39} S. K. Setia, M. S. Squillante, and S. K. Tripathi. Analysis of Processor Allocation in Multiprogrammed, Distributed-Memory Parallel Processing Systems. IEEE Trans. Parallel & Distributed Syst., 5(4):401-420, April 1994.

Digital Library

[40]

{40} A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, 6th Edition. John Wiley & Sons, 2001.

Digital Library

[41]

{41} P. G. Sobalvarro, S. Pakin, W. E. Weihl, and A. A. Chien. Dynamic Coscheduling on Workstation Clusters. In Proceedings of the IPPS Workshop on Job Scheduling Strategies for Parallel Processing, pages 231-256, March 1998.

Digital Library

[42]

{42} M. S. Squillante, Y. Zhang, A. Sivasubramaniam, N. Gautam, H. Franke, and J. Moreira. Modeling and Analysis of Dynamic Coscheduling in Parallel and Distributed Environments. In Proc. of SIGMETRICS2002, pages 43-54, June 2002.

Digital Library

[43]

{43} Supercluster Research and Development Group. Maui Scheduler. Available from http://supercluster.org/maui/.

[44]

{44} T. Takahashi, S. Sumimoto, A. Hori, H. Harada, and Y. Ishikawa. PM2: A High Performance Communication Middleware for Heterogeneous Network Environments. In Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), page 16, November 2000.

Digital Library

[45]

{45} TOP500.org. TOP500 SUPERCOMPUTER SITES. Available from http://www.top500.org.

[46]

{46} T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active Messages: A Mechanism for Integrated Communication and Computation. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 256-266, May 1992.

Digital Library

[47]

{47} Yokogawa Electric Cooperation. WT210/WT230 Digital Power Meter USER'S MANUAL, May 1998. Available from http://www.yokogawa.com/.

[48]

{48} A. B. Yoo and M. A. Jette. The Characteristics of Workload on ASCI Blue-Pacific at Lawrence Livermore National Laboratory. In Proc. of CCGrid2001, pages 295-302, May 2001.

Digital Library

[49]

{49} D. Zotkin and P. Keleher. Job-Length Estimation and Performance in Backfilling Schedulers. In Proceedings of 8th International Symposium on High Performance Distributed Computing (HPDC'8), 1999.

Digital Library

Cited By

Collins AHarris TCole MFensch CHoefler TIskra K(2015)LIRAProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2768405.2768407(1-8)Online publication date: 16-Jun-2015
https://dl.acm.org/doi/10.1145/2768405.2768407
Sajjapongse KWang XBecchi MParashar MWeissman JEpema DFigueiredo R(2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2493123.2462911(179-190)Online publication date: 17-Jun-2013
https://dl.acm.org/doi/10.1145/2493123.2462911
Sajjapongse KWang XBecchi MParashar MWeissman JEpema DFigueiredo R(2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2462902.2462911(179-190)Online publication date: 17-Jun-2013
https://dl.acm.org/doi/10.1145/2462902.2462911
Show More Cited By

Recommendations

A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters
Abstract
In this paper, we conduct an in-depth evaluation of a broad spectrum of scheduling alternatives for clusters. These include the widely used batch scheduling, local scheduling, gang scheduling, most prior communication-driven coscheduling ...
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

In modern distributed systems, coordinated time-sharing is required for communicating processes to leverage the performance of switch-based networks and low-overhead protocols. Coordinated time-sharing has traditionally been achieved with gang ...
Batch scheduling of identical jobs on parallel identical machines

We study the classical batch scheduling problem with identical job processing times and identical setups on parallel identical machines. We show that, similar to the single machine case, the solution is given by a closed form, consisting of identical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing

November 2004

724 pages

ISBN:0769521533

General Chair:
Jeff Huskamp

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Computer Society

United States

Publication History

Published: 06 November 2004

Check for updates

Author Tags

Qualifiers

Article

Conference

SC '04

Sponsor:

SIGARCH
IEEE-CS

SC '04: International Conference for High Performance Computing, Networking, Storage and Analysis

November 6 - 12, 2004

Acceptance Rates

SC '04 Paper Acceptance Rate 60 of 200 submissions, 30%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
14
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Collins AHarris TCole MFensch CHoefler TIskra K(2015)LIRAProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2768405.2768407(1-8)Online publication date: 16-Jun-2015
https://dl.acm.org/doi/10.1145/2768405.2768407
Sajjapongse KWang XBecchi MParashar MWeissman JEpema DFigueiredo R(2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2493123.2462911(179-190)Online publication date: 17-Jun-2013
https://dl.acm.org/doi/10.1145/2493123.2462911
Sajjapongse KWang XBecchi MParashar MWeissman JEpema DFigueiredo R(2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2462902.2462911(179-190)Online publication date: 17-Jun-2013
https://dl.acm.org/doi/10.1145/2462902.2462911
Rajaei HDadfar MJoshi PNicol DFujimoto R(2006)Simulation of job scheduling for small scale clustersProceedings of the 38th conference on Winter simulation10.5555/1218112.1218327(1195-1201)Online publication date: 3-Dec-2006
https://dl.acm.org/doi/10.5555/1218112.1218327
Ever EGemikonakli OChakka R(2006)A Mathematical Model for Performability of Beowulf ClustersProceedings of the 39th annual Symposium on Simulation10.1109/ANSS.2006.6(118-126)Online publication date: 2-Apr-2006
https://dl.acm.org/doi/10.1109/ANSS.2006.6
Frachtenberg EFeitelson DPetrini FFernandez J(2005)Adaptive Parallel Job Scheduling with Flexible CoschedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2005.13016:11(1066-1077)Online publication date: 1-Nov-2005
https://dl.acm.org/doi/10.1109/TPDS.2005.130

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten