ACM Home Page
Please provide us with feedback. Feedback
Space-efficient scheduling of nested parallelism
Full text PdfPdf (481 KB)
Source ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 21 ,  Issue 1  (January 1999) table of contents
Pages: 138 - 173  
Year of Publication: 1999
ISSN:0164-0925
Authors
Girija J. Narlikar  Carnegie Mellon Univ., Pittsburgh, PA
Guy E. Blelloch  Carnegie Mellon Univ., Pittsburgh, PA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 37,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/314602.314607
What is a DOI?

ABSTRACT

Many of today's high-level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This article presents an on-line scheduling algorithm that is provably space efficient and time efficient for nested-parallel languages. For a computation with depth D and serial space requirement S1, the algorithm generates a schedule that requires at most S1 + O(K•D•p) space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-off between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time efficient in theory, we demonstrate that it is effective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
5
6
7
 
8
BLUMOFE~ R. D. AND LEISERSON~ C. E. 1994. Scheduling multithreaded computations by work stealing. In Proc. 35th IEEE Syrup. on Foundations of Computer Science, pp. 356-368.
 
9
 
10
BURTON~ F. W. AND SIMPSON~ D. J. 1994. Space efficient execution of deterministic parallel programs. Manuscript.
11
 
12
 
13
14
 
15
 
16
17
 
18
 
19
FREEH, V. W., LOWENTHAL, D. K., AND ANDREWS, G. R. 1994. Distributed filaments: efficient fine-grain parallelism on a cluster of workstations. In 1st Symposium on Operating Systems Design and Implementation, Monterey, CA, pp. 201-212.
 
20
GOLDSTEIN, S. C., CULLER, D. E., AND SCHAUSER, K. E. 1995. Enabling primitives for compiling parallel languages. In 3rd Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, Rochester, NY.
 
21
CJREENGARD, L. 1987. The rapid evaluation of potential fields in particle systems. The MIT Press.
 
22
HALBHERR, M., ZHOU, Y., AND JOERG, C. F. 1994. Parallel programming based on continuationpassing thread. In Proc. 2nd International Workshop on Massive Parallelism: Hardware, Software and Applications, Capri, Italy.
23
 
24
HPF FORUM 1993. High Performance Fortran language specification, Version 1.0.
25
 
26
27
 
28
IEEE 1985. Threads extension for portable operating systems (draft 6).
 
29
 
30
MELLOR-CRUMMEY, J. M. 1987. Concurrent queues: Practical Fetch-and-q) algorithms. Technical Report 229 (Nov.), University of Rochester.
 
31
MILLS, P. H., NYLAND, L. S., PRINS, J. F., REIF, J. H., AND WAGNER, R. A. 1990. Prototyping parallel and distributed programs in Proteus. Technical Report UNC-CH TR90-041, Computer Science Department, University of North Carolina.
 
32
 
33
MUELLER, F. 1993. A library implementation of POSIX threads under unix. In Proc. Winter 1993 USENIX Technical Conference and Exhibition, San Diego, CA, USA, pp. 29-41.
 
34
NARLIKAR, G. J. 1999. Space-efficient multithreading. School of Computer Science, Carnegie Mellon University. Ph.D. thesis, to appear.
 
35
NARLIKAR, CJ. J. AND BLELLOCH, CJ. E. 1996. A framework for space and time efficient scheduling of parallelism. Technical Report CMU-CS-96-197, Computer Science Department, Carnegie Mellon University.
 
36
 
37
 
38
POWELL, M. L., I~LEIMAN, S. R., BARTON, S., SHAH, D., STEIN, D., AND WEEKS, M. 1991. SunOS multi-thread architecture. In USENIX ASSOCIATION (Ed.), Proc. Winter 1991 USENIX Conference: Dallas, TX, USA, pp. 65-80.
 
39
 
40
41
 
42
 
43
STRASSEN, V. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 354-356.
 
44
 
45

CITED BY  9
 


REVIEW

"M. S. Joy : Reviewer"

The authors consider the problem of process scheduling on a shared-memory parallel machine where the parallelism that can be specified in a user program is substantially greater than the number of available processors. A new asynchro more...

Collaborative Colleagues:
Girija J. Narlikar: colleagues
Guy E. Blelloch: colleagues

Peer to Peer - Readers of this Article have also read: