|
ABSTRACT
Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively.We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL.Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called "trim analysis," which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T∞ and is running on a machine with P processors. Using trim analysis, we prove that our scheduler completes the job in O(T1/P + T∞ + Llg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps excluding the O(T∞ + L lg P) time steps with the highest processor availability. When T1/T∞ >> P (the job's parallelism dominates the O(T∞ + L lg P)-trimmed availability), the job achieves nearly perfect linear speedup. Conversely, when T1/T∞ << P, the asymptotic running time of the job is nearly the length of its critical path.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
Guy E. Blelloch , Phillip B. Gibbons , Yossi Matias, Provably efficient scheduling for languages with fine-grained parallelism, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.1-12, June 24-26, 1995, Santa Barbara, California, United States
[doi> 10.1145/215399.215403]
|
 |
6
|
|
| |
7
|
|
 |
8
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
| |
9
|
|
 |
10
|
|
| |
11
|
Robert D. Blumofe and Philip A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In Proceedings of the USENIX Annual Technical Symposion, pages 133--147, 1997.
|
| |
12
|
Robert D. Blumofe and Dionisios Papadopoulos. Hood: a user-level threads library for multiprogrammed multiprocessors. Technical report, University of Texas at Austin, 1999.
|
| |
13
|
Robert D. Blumofe and David S. Park. Scheduling large-scale parallel computations on networks of workstations. In Proceedings of the IEEE International Symposium on High-Performance Distributed Computing, pages 96--105, 1994.
|
 |
14
|
|
 |
15
|
|
| |
16
|
Bradford L. Chamberlain , Sung-Eun Choi , E. Christopher Lewis , Calvin Lin , Lawrence Snyder , W. Derrick Weathersby, ZPL: A Machine Independent Programming Language for Parallel Computers, IEEE Transactions on Software Engineering, v.26 n.3, p.197-211, March 2000
[doi> 10.1109/32.842947
]
|
 |
17
|
|
| |
18
|
Xiaotie Deng , Nian Gu , Tim Brecht , KaiCheng Lu, Preemptive scheduling of parallel jobs on multiprocessors, Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, p.159-167, January 28-30, 1996, Atlanta, Georgia, United States
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Dror G. Feitelson. Job scheduling in multiprogrammed parallel systems (extended version). Technical report, IBM Research Report RC 19790 (87657) 2nd Revision, 1997.
|
| |
26
|
High Performance Fortran Forum. High Performance Fortran language specification version 1.0. Technical report, Rice University, 1993.
|
 |
27
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
28
|
|
 |
29
|
|
| |
30
|
R. L. Graham. Bounds on multiprocessing anomalies. SIAM Journal on Applied Mathematics, pages 17(2):416--429, 1969.
|
| |
31
|
Nian Gu. Competitive analysis of dynamic processor allocation strategies. Master's thesis, York University, 1995.
|
 |
32
|
|
| |
33
|
|
 |
34
|
D. A. Kranz , R. H. Halstead, Jr. , E. Mohr, Mul-T: a high-performance parallel Lisp, Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation, p.81-90, June 19-23, 1989, Portland, Oregon, United States
|
| |
35
|
C. Lasser. The Essential *Lisp Manual. Thinking Machines Corporation, 1986.
|
 |
36
|
|
 |
37
|
S. Majumdar , D. L. Eager , R. B. Bunt, Scheduling in multiprogrammed parallel systems, Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.104-113, May 24-27, 1988, Santa Fe, New Mexico, United States
|
| |
38
|
Xavier Martorell , Julita Corbalán , Dimitrios S. Nikolopoulos , Nacho Navarro , Eleftherios D. Polychronopoulos , Theodore S. Papatheodorou , Jesús Labarta, A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER, Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, p.87-112, May 01, 2000
|
 |
39
|
|
 |
40
|
Eric Mohr , David A. Kranz , Robert H. Halstead, Jr., Lazy task creation: a technique for increasing the granularity of parallel programs, Proceedings of the 1990 ACM conference on LISP and functional programming, p.185-197, June 27-29, 1990, Nice, France
[doi> 10.1145/91556.91631]
|
| |
41
|
Rajeev Motwani , Steven Phillips , Eric Torng, Non-clairvoyant scheduling, Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, p.422-431, January 25-27, 1993, Austin, Texas, United States
|
 |
42
|
|
 |
43
|
|
| |
44
|
Liang Peng, Weng~Fai Wong, Ming Dong Feng, and Chung Kwong Yuen. SilkRoad: a multithreaded runtime system with software distributed shared memory for SMP clusters. In IEEE International Conferrence on Cluster Computing, pages 243--249, 2000.
|
| |
45
|
J. R. Rose and G. L. Steele Jr. C*: An extended C language for data parallel programming. In Proceedings of Supercomputing, pages 2--16, 1987.
|
| |
46
|
|
| |
47
|
|
| |
48
|
Siddhartha Sen. Dynamic processor allocation for adaptively parallel jobs. Master's thesis, Massachusetts Institute of Technology, 2004.
|
| |
49
|
B. Song. Scheduling adaptively parallel jobs. Master's thesis, Massachusetts Institute of Technology, 1998.
|
| |
50
|
|
 |
51
|
|
| |
52
|
|
 |
53
|
|
 |
54
|
John Turek , Walter Ludwig , Joel L. Wolf , Lisa Fleischer , Prasoon Tiwari , Jason Glasgow , Uwe Schwiegelshohn , Philip S. Yu, Scheduling parallelizable tasks to minimize average response time, Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, p.200-209, June 27-29, 1994, Cape May, New Jersey, United States
[doi> 10.1145/181014.181331]
|
| |
55
|
K. K. Yue and D. J. Lilja. Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the solaris (tm) operating system. Concurrency and Computation-Practice and Experience, 13(6):449--464, 2001.
|
INDEX TERMS
Primary Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.1
Process Management
Additional Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
General Terms:
Algorithms,
Performance,
Theory
Keywords:
adaptive scheduling,
adversary,
critical path,
data-parallel computing,
greedy scheduling,
instantaneous parallelism,
job scheduling,
multiprocessing,
multiprogramming,
parallel computation,
parallelism feedback,
processor allocation,
space sharing,
task scheduling,
trim analysis,
two-level scheduling,
work
|