skip to main content
10.1109/SC.2004.20acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Coscheduling in Clusters: Is It a Viable Alternative?

Published: 06 November 2004 Publication History

Abstract

In this paper, we conduct an in-depth evaluation of a broad spectrum of scheduling alternatives for clusters. These include the widely used batch scheduling, local scheduling, gang scheduling, all prior communication-driven coscheduling algorithms (Dynamic Coscheduling (DCS), Spin Block (SB), Periodic Boost (PB), and Co-ordinated Coscheduling (CC)) and a newly proposed HYBRID coscheduling algorithm on a 16-node, Myrinet-connected Linux cluster. Performance and energy measurements using several NAS, LLNL and ANL benchmarks on the Linux cluster provide several interesting conclusions. First, although batch scheduling is currently used in most clusters, all blocking-based coscheduling techniques such as SB, CC and HYBRID and the gang scheduling can provide much better performance even in a dedicated cluster platform. Second, in contrast to some of the prior studies, we observe that blocking-based schemes like SB and HYBRID can provide better performance than spin-based techniques like PB on a Linux platform. Third, the proposed HYBRID scheduling provides the best performance-energy behavior and can be implemented on any cluster with little effort. All these results suggest that blocking-based coscheduling techniques are viable candidates to be used in clusters for significant performance-energy benefits.

References

[1]
{1} Open PBS. Available from http://www.openpbs.org.
[2]
{2} A. Acharya and S. Setia. Availability and Utility of Idle Memory in Workstation Clusters. In Proc. of ACM SIGMETRICS'99, pages 35-46, June 1999.
[3]
{3} S. Agarwal, G. Choi, C. R. Das, A. B. Yoo, and S. Nagar. Co-ordinated Coscheduling in time-sharing Clusters through a Generic Framework. In Proceedings of International Conference on Cluster Computing, December 2003.
[4]
{4} T. E. Anderson, D. E. Culler, and D. A. Patterson. A Case for NOW (Networks of Workstations). IEEE Micro, 15(1):54-64, February 1995.
[5]
{5} C. Anglano. A Comparative Evaluation of Implicit Coscheduling Strategies for Networks of Workstations. In Proceedings of 9th International Symposium on High Performance Distributed Computing (HPDC'9), pages 221-228, August 2000.
[6]
{6} A. C. Arpaci-Dusseau, D. E. Culler, and A. M. Mainwaring. Scheduling With Implicit Information in Distributed Systems. In Proceedings of the 1998 ACM SIGMETRICS joint International Conference on Measurement and Modeling of Computer Systems, pages 233-243, June 1998.
[7]
{7} A. M. Bailey. Accelerated Strategic Computing Initiative (ASCI) : Driving the Need for the Terascale Simulation Facility (TSF). In Proceedings of Energy2002 Workshop and Exposition, June 2002.
[8]
{8} A. Batat and D. G. Feitelson. Gang Scheduling with Memory Considerations. In Proceedings in 14th International Parallel and Distributed Processing Symposium, pages 109-114, May 2000.
[9]
{9} N. J. Boden et al. Myrinet: A Gigabit-per-second Local Area Network. IEEE Micro, 15(1):29-36, February 1995.
[10]
{10} D. P. Bovet and M. Cesati. Understanding the Linux Kernel. O'Reilly & Associates, Inc., October 2000.
[11]
{11} T. D. Burd and R. W. Brodersen. Design Issues for Dynamic Voltage Scaling. In Proceedings of the 2000 international symposium on Low power electronics and design, pages 9-14, July 2000.
[12]
{12} Compag, Intel and Microsoft. Specification for the Virtual Interface Architecture. Available from http://www.viarch.org, 1997.
[13]
{13} T. V. Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: A User-level Network Interface of Parallel and Distributed Computing. In Proc. of 15th SOSP, pages 40-53, Dec 1995.
[14]
{14} Y. Etsion and D. G. Feitelson. User-Level Communication in a System with Gang Scheduling. In In Proceedings of the International Parallel and Distributed Processing Symposium, 2001.
[15]
{15} D. G. Feitelson. A Survey of Scheduling in Multiprogrammed Parallel Systems. Technical Report Research Report RC 19790(87657), IBM T. J. Watson Research Center, October 1994.
[16]
{16} D. G. Feitelson and L. Rudolph. Distributed Hierarchical Control for Parallel Processing. IEEE Computer, 23(5):65-77, May 1990.
[17]
{17} Gigabit Ethernet Alliance. 10 Gigabit Ethernet Technology Overview White Paper. Available from http://www.10gea.org/Tech-whitepapers.htm.
[18]
{18} A. Hori, H. Tezuka, and Y. Ishikawa. Highly Efficient Gang Scheduling Implementation. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1-14, 1998.
[19]
{19} IBM Corporation. IBM LoadLeveler. Available from http://www.mppmu.mpg.de/computing/AIXuser/loadl.
[20]
{20} InfiniBand Trade Association. InfiniBand Architecture Specification, Volume 1 & 2, Release 1.1, November 2002. Available from http://www.infinibandta.org.
[21]
{21} Intel and Microsoft. Advanced Power Management v. 1.2. Available from http://www.microsoft.com/.
[22]
{22} Intel, Microsoft and Toshiba. The Advanced Configuration & Power Interface Specification. Available from http://www.acpi.info.
[23]
{23} M. A. Jette. Performance Characteristics of Gang Scheduling in Multiprogrammed Environments. In Proceedings of the 1997 ACM/IEEE conference on Supercomputing, pages 1-12, November 1997.
[24]
{24} M. T. Jones and P. E. Plassmann. Solution of Large, Sparse Systems of Linear Equations in Massively Parallel Applications. In Proceedings of the 1992 ACM/IEEE conference on Supercomputing, pages 551-560, November 1992.
[25]
{25} D. J. Kerbyson, A. Hoisie, and H. J. Wasserman. A Comparison Between the Earth Simulator and Alphaserver Systems Using Predictive Application Performance Models. In Proceeding of the International Parallel and Distributed Processing Symposium 2003, pages 64-73, April 2003.
[26]
{26} Lawreance Livermore National Laboratory. The sPPM Benchmark Code. Available from http://www.llnl.gov/asci/purple/benchmarks/limited/sppm.
[27]
{27} Lawrence Berkeley National Laboratory. Data Center Energy Benchmarking Case Study, July 2003. Available from http://datacenters.lbl.gov/docs/Data_Center_Fac- ility4.pdf.
[28]
{28} Lawrence Livermore National Laboratory. Accelerated Strategic Computing Initiative (ASCI) Program. Available from http://www.llnl.gov/asci.
[29]
{29} B. Lawson, E. Smirni, and D. Puiu. Self-adapting Backfilling Scheduling for Parallel Systems. In Proceedings of the 2002 International Conference on Parallel Processing (ICPP 2002), pages 583-592, August 2002.
[30]
{30} Myrinet, Inc. MPICH-GM software, October 2003. Available from http://www.myrinet.com/.
[31]
{31} Myrinet, Inc. Myrinet GM-1 software, October 2003. Available from http://www.myrinet.com/.
[32]
{32} S. Nagar, A. Banerjee, A. Sivasubramaniam, and C. R. Das. Alternatives to Coscheduling a Network of Workstations. Journal of Parallel and Distributed Computing, 59(2):302-327, November 1999.
[33]
{33} NASA Advanced Supercomputing division. The NAS Parallel Benchmarks (tech report and source code). Available from http://www.nas.nasa.gov/Software/NPB/.
[34]
{34} S. Pakin, M. Lauria, and A. Chien. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 55, December 1995.
[35]
{35} Quadrics Ltd. QsNet HIGH PERFORMANCE INTERCONNECT. Available from http://doc.quadrics.com/quadrics/Quadrics-Home.nsf/DisplayPages/Homepage.
[36]
{36} A. Rubini and J. Corbet. Linux Device Drivers, 2nd Edition. O'Reilly & Associates, Inc., June 2001.
[37]
{37} H. P. Scott Rhine, MSL. Loadable Scheduler Modules on Linux White Paper. Available from http://resourcemanagement.unixsolutions.hp.com.
[38]
{38} S. Setia, M. S. Squillante, and V. K. Naik. The Impact of Job Memory Requirements on Gang-Scheduling Performance. ACM SIGMETRICS Performance Evaluation Review, 26(4):30-39, 1999.
[39]
{39} S. K. Setia, M. S. Squillante, and S. K. Tripathi. Analysis of Processor Allocation in Multiprogrammed, Distributed-Memory Parallel Processing Systems. IEEE Trans. Parallel & Distributed Syst., 5(4):401-420, April 1994.
[40]
{40} A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, 6th Edition. John Wiley & Sons, 2001.
[41]
{41} P. G. Sobalvarro, S. Pakin, W. E. Weihl, and A. A. Chien. Dynamic Coscheduling on Workstation Clusters. In Proceedings of the IPPS Workshop on Job Scheduling Strategies for Parallel Processing, pages 231-256, March 1998.
[42]
{42} M. S. Squillante, Y. Zhang, A. Sivasubramaniam, N. Gautam, H. Franke, and J. Moreira. Modeling and Analysis of Dynamic Coscheduling in Parallel and Distributed Environments. In Proc. of SIGMETRICS2002, pages 43-54, June 2002.
[43]
{43} Supercluster Research and Development Group. Maui Scheduler. Available from http://supercluster.org/maui/.
[44]
{44} T. Takahashi, S. Sumimoto, A. Hori, H. Harada, and Y. Ishikawa. PM2: A High Performance Communication Middleware for Heterogeneous Network Environments. In Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), page 16, November 2000.
[45]
{45} TOP500.org. TOP500 SUPERCOMPUTER SITES. Available from http://www.top500.org.
[46]
{46} T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active Messages: A Mechanism for Integrated Communication and Computation. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 256-266, May 1992.
[47]
{47} Yokogawa Electric Cooperation. WT210/WT230 Digital Power Meter USER'S MANUAL, May 1998. Available from http://www.yokogawa.com/.
[48]
{48} A. B. Yoo and M. A. Jette. The Characteristics of Workload on ASCI Blue-Pacific at Lawrence Livermore National Laboratory. In Proc. of CCGrid2001, pages 295-302, May 2001.
[49]
{49} D. Zotkin and P. Keleher. Job-Length Estimation and Performance in Backfilling Schedulers. In Proceedings of 8th International Symposium on High Performance Distributed Computing (HPDC'8), 1999.

Cited By

View all
  • (2015)LIRAProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2768405.2768407(1-8)Online publication date: 16-Jun-2015
  • (2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2493123.2462911(179-190)Online publication date: 17-Jun-2013
  • (2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2462902.2462911(179-190)Online publication date: 17-Jun-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing
November 2004
724 pages
ISBN:0769521533

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 06 November 2004

Check for updates

Author Tags

  1. Batch Scheduling
  2. Coscheduling
  3. Energy Consumption
  4. Experimentation
  5. Gang Scheduling
  6. Linux Cluster
  7. Myrinet
  8. Performance Measurement
  9. Scheduling

Qualifiers

  • Article

Conference

SC '04
Sponsor:

Acceptance Rates

SC '04 Paper Acceptance Rate 60 of 200 submissions, 30%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)LIRAProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers10.1145/2768405.2768407(1-8)Online publication date: 16-Jun-2015
  • (2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2493123.2462911(179-190)Online publication date: 17-Jun-2013
  • (2013)A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUsProceedings of the 22nd international symposium on High-performance parallel and distributed computing10.1145/2462902.2462911(179-190)Online publication date: 17-Jun-2013
  • (2006)Simulation of job scheduling for small scale clustersProceedings of the 38th conference on Winter simulation10.5555/1218112.1218327(1195-1201)Online publication date: 3-Dec-2006
  • (2006)A Mathematical Model for Performability of Beowulf ClustersProceedings of the 39th annual Symposium on Simulation10.1109/ANSS.2006.6(118-126)Online publication date: 2-Apr-2006
  • (2005)Adaptive Parallel Job Scheduling with Flexible CoschedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2005.13016:11(1066-1077)Online publication date: 1-Nov-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media