skip to main content
10.1145/1122971.1122988acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Adaptive scheduling with parallelism feedback

Published: 29 March 2006 Publication History

Abstract

Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively.We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL.Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called "trim analysis," which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T and is running on a machine with P processors. Using trim analysis, we prove that our scheduler completes the job in O(T1/P + T + Llg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps excluding the O(T + L lg P) time steps with the highest processor availability. When T1/T >> P (the job's parallelism dominates the O(T + L lg P)-trimmed availability), the job achieves nearly perfect linear speedup. Conversely, when T1/T << P, the asymptotic running time of the job is nearly the length of its critical path.

References

[1]
Nimar S. Arora, Robert. D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 119--129, 1998.]]
[2]
Nikhil Bansal, Kedar Dhamdhere, Jochen Konemann, and Amitabh Sinha. Non-clairvoyant scheduling for minimizing mean slowdown. Algorithmica, 40(4):305--318, 2004.]]
[3]
Guy Blelloch, Phil Gibbons, and Yossi Matias. Provably efficient scheduling for languages with fine-grained parallelism. Journal of the ACM, 46(2):281--321, 1999.]]
[4]
Guy E. Blelloch, Siddhartha Chatterjee, Jonathan C. Hardwick, Jay Sipelstein, and Marco Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21(1):4--14, 1994.]]
[5]
Guy E. Blelloch, Phillip B. Gibbons, and Yossi Matias. Provably efficient scheduling for languages with fine-grained parallelism. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 1--12, 1995.]]
[6]
Guy E. Blelloch and John Greiner. A provable time and space efficient implementation of NESL. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming, pages 213--225, 1996.]]
[7]
Robert D. Blumofe. Executing multithreaded programs efficiently. PhD thesis, Massachusetts Institute of Technology, 1995.]]
[8]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 207--216, 1995.]]
[9]
Robert D. Blumofe and Charles E. Leiserson. Space-efficient scheduling of multithreaded computations. SIAM Journal on Computing, 27(1):202--229, February 1998.]]
[10]
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999.]]
[11]
Robert D. Blumofe and Philip A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In Proceedings of the USENIX Annual Technical Symposion, pages 133--147, 1997.]]
[12]
Robert D. Blumofe and Dionisios Papadopoulos. Hood: a user-level threads library for multiprogrammed multiprocessors. Technical report, University of Texas at Austin, 1999.]]
[13]
Robert D. Blumofe and David S. Park. Scheduling large-scale parallel computations on networks of workstations. In Proceedings of the IEEE International Symposium on High-Performance Distributed Computing, pages 96--105, 1994.]]
[14]
R. P. Brent. The parallel evaluation of general arithmetic expressions. Journal of the ACM, pages 201--206, 1974.]]
[15]
John L. Bruno, Jr. Edward G. Coffman, and Ravi Sethi. Scheduling independent tasks to reduce mean finishing time. Communications of the ACM, 17(7):382--387, 1974.]]
[16]
Bradford L. Chamberlain, Sung-Eun Choi, E. Christopher Lewis, Calvin Lin, Lawrence Snyder, and W. Derrick Weathersby. ZPL: a machine independent programming language for parallel computers. IEEE Transactions on Software Engineering, 26(3):197--211, 2000.]]
[17]
Xiaotie Deng and Patrick Dymond. On multiprocessor system scheduling. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 82--88, 1996.]]
[18]
Xiaotie Deng, Nian Gu, Tim Brecht, and KaiCheng Lu. Preemptive scheduling of parallel jobs on multiprocessors. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 159--167. Society for Industrial and Applied Mathematics, 1996.]]
[19]
Derek L. Eager, John Zahorjan, and Edward D. Lozowska. Speedup versus efficiency in parallel systems. IEEE Transactions on Computers, 38(3):408--423, 1989.]]
[20]
Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz. Compiler and runtime support for programming in adaptive parallel environments. Technical Report Technical Report CS-TR-3510, University of Maryland, 1995.]]
[21]
Guy Edjlali, Gagan Agrawal, Alan Sussman, and Joel Saltz. Data parallel programming in an adaptive environment. Technical Report Technical Report CS-TR-3350, University of Maryland, 1994.]]
[22]
Jeff Edmonds. Scheduling in the dark. In Proceedings of the ACM Symposium on the Theory of Computing, pages 179--188, 1999.]]
[23]
Jeff Edmonds, Donald~D. Chinn, Timothy Brecht, and Xiaotie Deng. Non-clairvoyant multiprocessor scheduling of jobs with changing execution characteristics. Journal of Scheduling, 6(3):231--250, 2003.]]
[24]
Zhixi Fang, Peiyi Tang, Pen-Chung Yew, and Chuan-Qi Zhu. Dynamic processor self-scheduling for general parallel nested loops. IEEE Transactions on Computers, 39(7):919--929, 1990.]]
[25]
Dror G. Feitelson. Job scheduling in multiprogrammed parallel systems (extended version). Technical report, IBM Research Report RC 19790 (87657) 2nd Revision, 1997.]]
[26]
High Performance Fortran Forum. High Performance Fortran language specification version 1.0. Technical report, Rice University, 1993.]]
[27]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998.]]
[28]
Joseph Gil and Yossi Matias. Fast and efficient simulations among CRCW PRAMs. Journal on Parallel and Distribributed Computing, 23(2):135--148, 1994.]]
[29]
Allan Gottlieb, Boris D. Lubachevsky, and Larry Rudolph. Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. Programming Languages and Systems, 5(2):164--189, 1983.]]
[30]
R. L. Graham. Bounds on multiprocessing anomalies. SIAM Journal on Applied Mathematics, pages 17(2):416--429, 1969.]]
[31]
Nian Gu. Competitive analysis of dynamic processor allocation strategies. Master's thesis, York University, 1995.]]
[32]
Tim J. Harris. A survey of PRAM simulation techniques. ACM Computing Survey, 26(2):187--206, 1994.]]
[33]
S. F. Hummel and E. Schonberg. Low-overhead scheduling of nested parallelism. IBM Journal of Research and Development, 35(5-6):743--765, 1991.]]
[34]
David A. Kranz, Robert H. Halstead, Jr., and Eric Mohr. Mul-T: a high-performance parallel Lisp. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 81--90, 1989.]]
[35]
C. Lasser. The Essential *Lisp Manual. Thinking Machines Corporation, 1986.]]
[36]
Scott T. Leutenegger and Mary K. Vernon. The performance of multiprogrammed multiprocessor scheduling policies. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 226--236, 1990.]]
[37]
Shikharesh Majumdar, Derek L. Eager, and Richard B. Bunt. Scheduling in multiprogrammed parallel systems. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 104--113, 1988.]]
[38]
Xavier Martorell, Julita Corbalán, Dimitrios S. Nikolopoulos, Nacho Navarro, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou, and Jesús Labarta. A tool to schedule parallel applications on multiprocessors: the NANOS CPU manager. In Dror G. Feitelson and Larry Rudolph, editors, Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 87--112, 2000.]]
[39]
Cathy McCann, Raj Vaswani, and John Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Transactions on Computer Systems, 11(2):146--178, 1993.]]
[40]
Eric Mohr, David A. Kranz, and Jr. Robert H. Halstead. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the ACM Conference on Lisp and Functional Programming, pages 185--197, 1990.]]
[41]
Rajeev Motwani, Steven Phillips, and Eric Torng. Non-clairvoyant scheduling. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 422--431, 1993.]]
[42]
V. K. Naik, M. S. Squillante, and S. K. Setia. Performance analysis of job scheduling policies in parallel supercomputing environments. In Proceedings of Supercomputing, pages 824--833, 1993.]]
[43]
Girija J. Narlikar and Guy E. Blelloch. Space-efficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21(1):138--173, 1999.]]
[44]
Liang Peng, Weng~Fai Wong, Ming Dong Feng, and Chung Kwong Yuen. SilkRoad: a multithreaded runtime system with software distributed shared memory for SMP clusters. In IEEE International Conferrence on Cluster Computing, pages 243--249, 2000.]]
[45]
J. R. Rose and G. L. Steele Jr. C*: An extended C language for data parallel programming. In Proceedings of Supercomputing, pages 2--16, 1987.]]
[46]
Emilia Rosti, Evgenia Smirni, Lawrence W. Dowdy, Giuseppe Serazzi, and Brian M. Carlson. Robust partitioning schemes of multiprocessor systems. Performance Evaluation, 19(2-3):141--165, 1994.]]
[47]
Emilia Rosti, Evgenia Smirni, Giuseppe Serazzi, and Lawrence W. Dowdy. Analysis of non-work-conserving processor partitioning policies. In Proceedings of the International Parallel Processing Symposium, pages 165--181, 1995.]]
[48]
Siddhartha Sen. Dynamic processor allocation for adaptively parallel jobs. Master's thesis, Massachusetts Institute of Technology, 2004.]]
[49]
B. Song. Scheduling adaptively parallel jobs. Master's thesis, Massachusetts Institute of Technology, 1998.]]
[50]
Mark S. Squillante. On the benefits and limitations of dynamic partitioning in parallel computer systems. In Proceedings of the International Parallel Processing Symposium, pages 219--238, 1995.]]
[51]
Dan Suciu and Val Tannen. Efficient compilation of high-level data parallel algorithms. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 57--66, 1994.]]
[52]
Kaushik Guha Timothy B. Brecht. Using parallel program characteristics in dynamic processor allocation policies. Performance Evaluation, 27-28:519--539, 1996.]]
[53]
Andrew Tucker and Anoop Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 159--166, 1989.]]
[54]
John Turek, Walter Ludwig, Joel L. Wolf, Lisa Fleischer, Prasoon Tiwari, Jason Glasgow, Uwe Schwiegelshohn, and Philip S. Yu. Scheduling parallelizable tasks to minimize average response time. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 200--209, 1994.]]
[55]
K. K. Yue and D. J. Lilja. Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the solaris (tm) operating system. Concurrency and Computation-Practice and Experience, 13(6):449--464, 2001.]]

Cited By

View all
  • (2024)Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task PartitionProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698551(830-846)Online publication date: 20-Nov-2024
  • (2022)Cloud-native workflow scheduling using a hybrid priority rule and dynamic task parallelismProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563495(72-77)Online publication date: 7-Nov-2022
  • (2020)Resource-Aware Data Parallel Array ProcessingInternational Journal of Parallel Programming10.1007/s10766-020-00664-0Online publication date: 9-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
March 2006
258 pages
ISBN:1595931899
DOI:10.1145/1122971
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive scheduling
  2. adversary
  3. critical path
  4. data-parallel computing
  5. greedy scheduling
  6. instantaneous parallelism
  7. job scheduling
  8. multiprocessing
  9. multiprogramming
  10. parallel computation
  11. parallelism feedback
  12. processor allocation
  13. space sharing
  14. task scheduling
  15. trim analysis
  16. two-level scheduling
  17. work

Qualifiers

  • Article

Conference

PPoPP06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task PartitionProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698551(830-846)Online publication date: 20-Nov-2024
  • (2022)Cloud-native workflow scheduling using a hybrid priority rule and dynamic task parallelismProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563495(72-77)Online publication date: 7-Nov-2022
  • (2020)Resource-Aware Data Parallel Array ProcessingInternational Journal of Parallel Programming10.1007/s10766-020-00664-0Online publication date: 9-Jun-2020
  • (2019)A dynamic and QoS-effective resource management systemInternational Journal of High Performance Computing and Networking10.5555/3337645.333764713:3(261-282)Online publication date: 1-Jan-2019
  • (2019)Semi-Clairvoyant Scheduling in Data Analytics SystemsIEEE Transactions on Computers10.1109/TC.2019.2906193(1-1)Online publication date: 2019
  • (2018)COBRA: Toward Provably Efficient Semi-Clairvoyant Scheduling in Data Analytics SystemsIEEE INFOCOM 2018 - IEEE Conference on Computer Communications10.1109/INFOCOM.2018.8485981(513-521)Online publication date: Apr-2018
  • (2018)Scheduling Parallelizable Jobs Online to Maximize ThroughputLATIN 2018: Theoretical Informatics10.1007/978-3-319-77404-6_55(755-776)Online publication date: 13-Mar-2018
  • (2017)Brief AnnouncementProceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3087556.3087590(87-89)Online publication date: 24-Jul-2017
  • (2016)Scheduling parallel DAG jobs online to minimize average flow timeProceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms10.5555/2884435.2884449(176-189)Online publication date: 10-Jan-2016
  • (2016)Work stealing for interactive services to meet target latencyACM SIGPLAN Notices10.1145/3016078.285115151:8(1-13)Online publication date: 27-Feb-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media