|
ABSTRACT
Many of today's high-level parallel languages support dynamic,
fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This article presents an on-line scheduling algorithm that is provably space efficient and time efficient for nested-parallel languages. For a computation with depth D and serial space requirement S1, the algorithm generates a schedule that requires at most S1 + O(K•D•p) space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-off between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time efficient in theory, we demonstrate that it is effective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
Guy E. Blelloch , Phillip B. Gibbons , Yossi Matias, Provably efficient scheduling for languages with fine-grained parallelism, Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures, p.1-12, June 24-26, 1995, Santa Barbara, California, United States
[doi> 10.1145/215399.215403]
|
 |
5
|
Robert D. Blumofe , Matteo Frigo , Christopher F. Joerg , Charles E. Leiserson , Keith H. Randall, An analysis of dag-consistent distributed shared-memory algorithms, Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures, p.297-308, June 24-26, 1996, Padua, Italy
[doi> 10.1145/237502.237574]
|
 |
6
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
 |
7
|
|
| |
8
|
BLUMOFE~ R. D. AND LEISERSON~ C. E. 1994. Scheduling multithreaded computations by work stealing. In Proc. 35th IEEE Syrup. on Foundations of Computer Science, pp. 356-368.
|
| |
9
|
|
| |
10
|
BURTON~ F. W. AND SIMPSON~ D. J. 1994. Space efficient execution of deterministic parallel programs. Manuscript.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
FREEH, V. W., LOWENTHAL, D. K., AND ANDREWS, G. R. 1994. Distributed filaments: efficient fine-grain parallelism on a cluster of workstations. In 1st Symposium on Operating Systems Design and Implementation, Monterey, CA, pp. 201-212.
|
| |
20
|
GOLDSTEIN, S. C., CULLER, D. E., AND SCHAUSER, K. E. 1995. Enabling primitives for compiling parallel languages. In 3rd Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, Rochester, NY.
|
| |
21
|
CJREENGARD, L. 1987. The rapid evaluation of potential fields in particle systems. The MIT Press.
|
| |
22
|
HALBHERR, M., ZHOU, Y., AND JOERG, C. F. 1994. Parallel programming based on continuationpassing thread. In Proc. 2nd International Workshop on Massive Parallelism: Hardware, Software and Applications, Capri, Italy.
|
 |
23
|
|
| |
24
|
HPF FORUM 1993. High Performance Fortran language specification, Version 1.0.
|
 |
25
|
Wilson C. Hsieh , Paul Wang , William E. Weihl, Computation migration: enhancing locality for distributed-memory parallel systems, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.239-248, May 19-22, 1993, San Diego, California, United States
|
| |
26
|
|
 |
27
|
|
| |
28
|
IEEE 1985. Threads extension for portable operating systems (draft 6).
|
| |
29
|
|
| |
30
|
MELLOR-CRUMMEY, J. M. 1987. Concurrent queues: Practical Fetch-and-q) algorithms. Technical Report 229 (Nov.), University of Rochester.
|
| |
31
|
MILLS, P. H., NYLAND, L. S., PRINS, J. F., REIF, J. H., AND WAGNER, R. A. 1990. Prototyping parallel and distributed programs in Proteus. Technical Report UNC-CH TR90-041, Computer Science Department, University of North Carolina.
|
| |
32
|
|
| |
33
|
MUELLER, F. 1993. A library implementation of POSIX threads under unix. In Proc. Winter 1993 USENIX Technical Conference and Exhibition, San Diego, CA, USA, pp. 29-41.
|
| |
34
|
NARLIKAR, G. J. 1999. Space-efficient multithreading. School of Computer Science, Carnegie Mellon University. Ph.D. thesis, to appear.
|
| |
35
|
NARLIKAR, CJ. J. AND BLELLOCH, CJ. E. 1996. A framework for space and time efficient scheduling of parallelism. Technical Report CMU-CS-96-197, Computer Science Department, Carnegie Mellon University.
|
| |
36
|
|
| |
37
|
|
| |
38
|
POWELL, M. L., I~LEIMAN, S. R., BARTON, S., SHAH, D., STEIN, D., AND WEEKS, M. 1991. SunOS multi-thread architecture. In USENIX ASSOCIATION (Ed.), Proc. Winter 1991 USENIX Conference: Dallas, TX, USA, pp. 65-80.
|
| |
39
|
|
| |
40
|
|
 |
41
|
|
| |
42
|
|
| |
43
|
STRASSEN, V. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 354-356.
|
| |
44
|
|
| |
45
|
|
CITED BY 9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shimin Chen , Phillip B. Gibbons , Michael Kozuch , Vasileios Liaskovitis , Anastassia Ailamaki , Guy E. Blelloch , Babak Falsafi , Limor Fix , Nikos Hardavellas , Todd C. Mowry , Chris Wilkerson, Scheduling threads for constructive cache sharing on CMPs, Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, June 09-11, 2007, San Diego, California, USA
|
|
Kunal Agrawal , Yuxiong He , Wen Jing Hsu , Charles E. Leiserson, Adaptive scheduling with parallelism feedback, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
|
|
|
REVIEW
"M. S. Joy : Reviewer"
The authors consider the problem of process scheduling on a
shared-memory parallel machine where the parallelism that can be
specified in a user program is substantially greater than the number of
available processors. A new asynchro
more...
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|