|
ABSTRACT
We present an adaptive work-stealing thread scheduler, A-Steal, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealing library. The A-Steal algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. A-Steal provides continual parallelism feedback to a job scheduler in the form of processor requests, and the job must adaptits execution to the processors allotted to it. Assuming that the job scheduler never allots any job more processors than requested by thejob's thread scheduler, A-Steal guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors. Our analysis models the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler's administrative policies. We analyze the performance of A-Steal using "trim analysis," which allows us to prove that our thread scheduler performs poorly on at most a small number of time steps, while exhibiting near-optimal behavior on the vast majority. To be precise, suppose that a job has work T1 and span (critical-path length)T∞. On a machine with P processors, A-Steal completes the job in expected O(T1/P + T∞ + L lg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all but the O(T∞ + L lg P)time steps having the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, P « T1/T∞, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly its span.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Umut A. Acar , Guy E. Blelloch , Robert D. Blumofe, The data locality of work stealing, Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, p.1-12, July 09-13, 2000, Bar Harbor, Maine, United States
[doi> 10.1145/341800.341801]
|
 |
2
|
Kunal Agrawal , Yuxiong He , Wen Jing Hsu , Charles E. Leiserson, Adaptive scheduling with parallelism feedback, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122988]
|
| |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
Guy E. Blelloch , Phillip B. Gibbons , Yossi Matias, Provably efficient scheduling for languages with fine-grained parallelism, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.1-12, June 24-26, 1995, Santa Barbara, California, United States
[doi> 10.1145/215399.215403]
|
 |
9
|
|
| |
10
|
|
 |
11
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
| |
12
|
|
 |
13
|
|
| |
14
|
Robert D. Blumofe, Charles E. Leiserson, and Bin Song. Automatic processor allocation for work-stealing jobs. 1998.
|
| |
15
|
Robert D. Blumofe and Philip A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In USENIX, pages 133--147, Anaheim, California, 1997.
|
 |
16
|
|
| |
17
|
Robert D. Blumofe and Dionisios Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. Technical report, University of Texas at Austin, 1999.
|
| |
18
|
Robert D. Blumofe and David S. Park. Scheduling large-scale parallel computations on networks of workstations. In HPDC, pages 96--105, San Francisco, California, 1994.
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
DESMO-J: A framework for discrete-event modelling and simulation. http://asi-www.informatik.uni-hamburg.de/desmoj/.
|
| |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
Dror G. Feitelson. Job scheduling in multiprogrammed parallel systems (extended version). Technical report, IBM Research Report RC 19790 (87657) 2nd Revision, 1997.
|
 |
30
|
|
 |
31
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
32
|
|
| |
33
|
R. L. Graham. Bounds on multiprocessing anomalies. SIAM Journal on Applied Mathematics, pages 17(2):416--429, 1969.
|
| |
34
|
Nian Gu. Competitive analysis of dynamic processor allocation strategies. Master's thesis, York University, 1995.
|
| |
35
|
Michael Halbherr, Yuli Zhou, and Chris F. Joerg. MIMD-style parallel programming with continuation-passing threads. In Proceedings of the International Workshop on Massive Parallelism: Hardware, Software, and Applications, Capri, Italy, September 1994.
|
 |
36
|
|
| |
37
|
Yuxiong He, Hsu Wen Jing, and Charles E. Leiserson. Provably efficient two-level adaptive scheduling for multithreaded jobs on parallel computers. JSSPP, 2006.
|
| |
38
|
|
 |
39
|
|
 |
40
|
|
 |
41
|
Eric Mohr , David A. Kranz , Robert H. Halstead, Jr., Lazy task creation: a technique for increasing the granularity of parallel programs, Proceedings of the 1990 ACM conference on LISP and functional programming, p.185-197, June 27-29, 1990, Nice, France
[doi> 10.1145/91556.91631]
|
| |
42
|
Rajeev Motwani , Steven Phillips , Eric Torng, Non-clairvoyant scheduling, Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, p.422-431, January 25-27, 1993, Austin, Texas, United States
|
| |
43
|
|
 |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
|
 |
48
|
Larry Rudolph , Miriam Slivkin-Allalouf , Eli Upfal, A simple load balancing scheme for task allocation in parallel machines, Proceedings of the third annual ACM symposium on Parallel algorithms and architectures, p.237-245, July 21-24, 1991, Hilton Head, South Carolina, United States
[doi> 10.1145/113379.113401]
|
 |
49
|
William N. Scherer, III , Doug Lea , Michael L. Scott, Scalable synchronous queues, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122994]
|
| |
50
|
Siddhartha Sen. Dynamic processor allocation for adaptively parallel jobs. Master's thesis, Massachusetts Institute of technology, 2004.
|
 |
51
|
|
 |
52
|
|
 |
53
|
|
| |
54
|
B. Song. Scheduling adaptively parallel jobs. Master's thesis, Massachusetts Institute of Technology, 1998.
|
| |
55
|
|
 |
56
|
|
| |
57
|
|
| |
58
|
K. K. Yue and D. J. Lilja. Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the Solaris™ operating system. Concurrency and Computation-Practice and Experience, 13(6):449--464, 2001.
|
INDEX TERMS
Primary Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.1
Process Management
Subjects:
Scheduling
Additional Classification:
D.
Software
D.4
OPERATING SYSTEMS
D.4.1
Process Management
Subjects:
Multiprocessing/multiprogramming/multitasking
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.0
General
General Terms:
Algorithms,
Performance,
Theory
Keywords:
adaptive scheduling,
adversary,
distributed scheduling,
job scheduling,
multiprogramming,
multithreaded languages,
parallel computation,
parallelism feedback,
processor allocation,
space sharing,
thread scheduling,
trim analysis,
two-level scheduling,
work stealing
|