ABSTRACT
This paper develops a methodology for compiling and executing irregular parallel programs. Such programs implement parallel operations whose size and work distribution depend on input data. We show a fundamental relationship between three quantities that characterize an irregular parallel computation: the total available parallelism, the optimal grain size, and the statistical variance of execution times for individual tasks. This relationship yields a dynamic scheduling algorithm that substantially reduces the overhead of executing irregular parallel operations.
We incorporated this algorithm into an extended Fortran compiler. The compiler accepts as input a subset of Fortran D which includes blocked and cyclic decompositions and perfect alignment; it outputs Fortran 77 augmented with calls to library routines written in C. For irregular parallel operations, the compiled code gathers information about available parallelism and task execution time variance and uses this information to schedule the operation. On distributed memory architectures, the compiler encodes information about data access patterns for the runtime scheduling system so that it can preserve communication locality.
We evaluated these compilation techniques using a set of application programs including climate modeling, circuit simulation, and x-ray tomography, that contain irregular parallel operations. The results demonstrate that, for these applications, the dynamic techniques described here achieve near-optimal efficiency on large numbers of processors. In addition, they perform significantly better, on these problems, than any previously proposed static or dynamic scheduling algorithm.
- 1.B. Acldand, S. Lucco, T. London, and E. DeBenedictis. "CEMU: A Parallel Circuit Simulator,". In Proceedings o/the International Conference on Computer Design, October 1986.Google Scholar
- 2.F. Allen, M. Burke, P. Charles, R. Cytron, and J. Ferrante. "An Overview of the PTRAN Analysis System for Multiprocessing,". Journal o/Parallel and Distributed Computing, 5:617-640, October 1988. Google ScholarDigital Library
- 3.J. R. Alien, D. Baumgartner, K. Kennedy, and A. Porterfield. "PTOOL: A Semi-Automatic Parallel Programming Assistant,". In International Conference on Parallel Processing, pages 164-170, 1986.Google Scholar
- 4.J. R. Allen and K. Kennedy. "Automatic Loop Interchange,". In Proceedings of the SIGPLAN Symposium on Compiler Construction, pages 233-246, June 1984. Google ScholarDigital Library
- 5.A. Almgren. A Fast Adaptive Vortex Method Using Local Corrections. PhD thesis, Center for Pure and Applied Mathematics, UC/Berkeley, 1991.Google Scholar
- 6.A. Arakawa and V. R. Lamb. "Computational Design of the Basic Dynamical Processes of the UCLA General Circulation Model,". Methods in Computational Physics, 17:173-265, 1977.Google Scholar
- 7.D. CaUahan, K. Cooper, K. Kennedy, and L. Torczon. "Interprocedural Constant Propagation,". in Proceedings o/the SlGPLAN Symposium on Compiler Construction, pages 152-161, Palo Alto, 1986. Google ScholarDigital Library
- 8.R. Cytron. "Limited Processor Scheduling of Doacross Loops,". in Proceedings ICPP, pages 226-234, 1987.Google Scholar
- 9.R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and K. Zadeck. "An Efficient Method of Computing Static Single Assignment Form,". In A CM Conference on Principles of Programming Languages, pages 25- 35, 1989. Google ScholarDigital Library
- 10.M. Girkar and C. Polychronopoulos. "Partitioning Programs for Parallel Execution,". 1988 International Conference on $upercomputing (IC$), pages 217-229, July 1988. Google ScholarDigital Library
- 11.E. Gumbel. "The Maxima of the Mean of the Largest Value of the Range,". Annals of Mathematics and Statistics, 25, 1954.Google Scholar
- 12.O. Hastsson and A. Mayer. "Heuristic Search as Evidential Reasoning,". In Proceedings of the Fifth Workshop on Uncertainty in AI, August 1989.Google Scholar
- 13.K. A. Heiskanen. Tomography with Limited Data in Fan Beam Geometry. PhD thesis, UC/Berkeley, 1990.Google Scholar
- 14.S. Hiranandani, K. Kennedy, and C.-W. Tseng. "Cornprier Support for Machine-Independent Parallel Programming in Fortran D,". Technical Report TR91- 149, Rice University, March 1991.Google Scholar
- 15.S. F. Hummel, E. Schonberg, and L. E. Flynn. "Factoring: A Practical and Robust Method for Scheduling Parallel Loops,". Technical Report 74172, IBM Research Division, 1991.Google Scholar
- 16.C. Kruskal and A. Weiss. "Allocating Independent Subtasks on Parallel Processors,". IEEE Transactions on Software Engineering, SE-11, October 1985. Google ScholarDigital Library
- 17.S. Lucco. "A Heuristic Linda Kernel for Hypercube Multiprocessors,". In Proceedings of the Second Conference on Hypercube Multiprocessors, September 1987.Google Scholar
- 18.S. Lucco. "Parallel Programming in a Virtual Object Space,". In Conference on Object-Oriented Programruing Systems, Languages, and Applications, October 1987. Google ScholarDigital Library
- 19.S. Lucco and D. Anderson. "Tarmac: A Language Systern Substrate Based on Mobile Memory,". In International Conference on Distributed Computing Systems, 1990.Google ScholarCross Ref
- 20.S. Lucco and K. Nichols. "A Performance Analysis of Three Parallel Programming Methodologies in the Context of MOS Timing Simulation,". In Digest of Papers: IEEE Compcon, pages 205-210, 1987.Google Scholar
- 21.S. Lucco and O. Sharp. "Delirium: An Embedding Coordination Language,". In Proceedings o/Super. computing '90, pages 515-524, November 1990. Google ScholarDigital Library
- 22.S. Lucco and O. Sharp. "Parallel Programming With Coordination Structures,". In A CM Conference on the Principles of Programming Languages, :lanuary 1991. Google ScholarDigital Library
- 23.C. Polychronopoulis and D. Kuck. "Guided Self- Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,". iEEE Transactions on Computers, C-36(12), December 1987. Google ScholarDigital Library
- 24.C. Polychronopoulos and U. Banerjee. "Speedup Bounds and Processor Allocation for Parallel Programs on a Multiprocessor,". Proceedings of the 1986 international Conference on Parallel Processing, pages 961-968, August 1986.Google Scholar
- 25.C. Polychronopoulos, M. Girkar, M. R. Haghighat, C. L. Lee, B. Leung, and D. Schouten. "Parafrase-2: An Environment for ParaUelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessots,". In Proceedings of the international Conference on Parallel Processing, volume II, pages 39-48, 1989.Google ScholarDigital Library
- 26.L. Rudolph, M. Slivkin-Allalouf, and E. Upfal. "A Simple Load Balancing Scheme for Task Allocation in Parallel Machines,". In A CM Symposium on Parallel Algorithms and Architectures, 1991. Google ScholarDigital Library
- 27.V. Sarkar. "Determining Average Program Execution Times and their Variance,". In SIGPLAN Conference on Programming Language Design and Implementation, June 1989. Google ScholarDigital Library
- 28.V. Sarkar and J. Hennessey. "Partitioning Parallel Programs for Macro Datafiow,'. In A CM Conference on Lisp and Functional Programming, pages 202-211, Cambridge, Mass., 1986. Google ScholarDigital Library
- 29.P. Tang and P.-C. Yew. "Processor Self-Scheduling for Multiple Nested Parallel Loops,". Proceedings o/the 1986 International Conference on Parallel Processing, pages 528-535, August 1986.Google Scholar
- 30.S. Ulam and N. Metropolis. "The Monte Carlo Method,". Journal of the American Statistics Association, 44:335ff, 1949.Google ScholarCross Ref
- 31.M. E. Wolf and M. S. Lam. "A Data Locality Optimizing Algorithm,". In SIGPLAN Conference on Programming Language Design and Implementation, pages 30-44, June 1991. Google ScholarDigital Library
- 32.M. E. Wolf and M. S. Lain. "A Loop Transformation Theory and an Algorithm to Maximize Parallelism,". IEEE Transactions on Parallel and Distributed Systems, 2(4):452-471, October 1991. Google ScholarDigital Library
- 33.M. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, University of Illinois at Urbana- Champaign, October 1982. Google ScholarDigital Library
- 34.M. Wolfe. "More Iteration Space Tiling,". In Proceedings of Supercomputing, pages 655-664, 1989. Google ScholarDigital Library
Index Terms
- A dynamic scheduling method for irregular parallel programs
Recommendations
A dynamic scheduling method for irregular parallel programs
This paper develops a methodology for compiling and executing irregular parallel programs. Such programs implement parallel operations whose size and work distribution depend on input data. We show a fundamental relationship between three quantities ...
Compiling machine-independent parallel programs
Initial evidence is presented that explicitly parallel, machine-independent programs can automatically be translated into parallel machine code that is competitive in performance with hand-written code.The programming language used is Modula-2*, an ...
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
A processor self-scheduling scheme is proposed for general parallel nested loops in multiprocessor systems. In this scheme, programs are instrumented to allow processors to schedule loop iterations among themselves dynamically at run time without ...
Comments