skip to main content
10.1145/143095.143134acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

A dynamic scheduling method for irregular parallel programs

Published:01 July 1992Publication History

ABSTRACT

This paper develops a methodology for compiling and executing irregular parallel programs. Such programs implement parallel operations whose size and work distribution depend on input data. We show a fundamental relationship between three quantities that characterize an irregular parallel computation: the total available parallelism, the optimal grain size, and the statistical variance of execution times for individual tasks. This relationship yields a dynamic scheduling algorithm that substantially reduces the overhead of executing irregular parallel operations.

We incorporated this algorithm into an extended Fortran compiler. The compiler accepts as input a subset of Fortran D which includes blocked and cyclic decompositions and perfect alignment; it outputs Fortran 77 augmented with calls to library routines written in C. For irregular parallel operations, the compiled code gathers information about available parallelism and task execution time variance and uses this information to schedule the operation. On distributed memory architectures, the compiler encodes information about data access patterns for the runtime scheduling system so that it can preserve communication locality.

We evaluated these compilation techniques using a set of application programs including climate modeling, circuit simulation, and x-ray tomography, that contain irregular parallel operations. The results demonstrate that, for these applications, the dynamic techniques described here achieve near-optimal efficiency on large numbers of processors. In addition, they perform significantly better, on these problems, than any previously proposed static or dynamic scheduling algorithm.

References

  1. 1.B. Acldand, S. Lucco, T. London, and E. DeBenedictis. "CEMU: A Parallel Circuit Simulator,". In Proceedings o/the International Conference on Computer Design, October 1986.Google ScholarGoogle Scholar
  2. 2.F. Allen, M. Burke, P. Charles, R. Cytron, and J. Ferrante. "An Overview of the PTRAN Analysis System for Multiprocessing,". Journal o/Parallel and Distributed Computing, 5:617-640, October 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.J. R. Alien, D. Baumgartner, K. Kennedy, and A. Porterfield. "PTOOL: A Semi-Automatic Parallel Programming Assistant,". In International Conference on Parallel Processing, pages 164-170, 1986.Google ScholarGoogle Scholar
  4. 4.J. R. Allen and K. Kennedy. "Automatic Loop Interchange,". In Proceedings of the SIGPLAN Symposium on Compiler Construction, pages 233-246, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.A. Almgren. A Fast Adaptive Vortex Method Using Local Corrections. PhD thesis, Center for Pure and Applied Mathematics, UC/Berkeley, 1991.Google ScholarGoogle Scholar
  6. 6.A. Arakawa and V. R. Lamb. "Computational Design of the Basic Dynamical Processes of the UCLA General Circulation Model,". Methods in Computational Physics, 17:173-265, 1977.Google ScholarGoogle Scholar
  7. 7.D. CaUahan, K. Cooper, K. Kennedy, and L. Torczon. "Interprocedural Constant Propagation,". in Proceedings o/the SlGPLAN Symposium on Compiler Construction, pages 152-161, Palo Alto, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.R. Cytron. "Limited Processor Scheduling of Doacross Loops,". in Proceedings ICPP, pages 226-234, 1987.Google ScholarGoogle Scholar
  9. 9.R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and K. Zadeck. "An Efficient Method of Computing Static Single Assignment Form,". In A CM Conference on Principles of Programming Languages, pages 25- 35, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.M. Girkar and C. Polychronopoulos. "Partitioning Programs for Parallel Execution,". 1988 International Conference on $upercomputing (IC$), pages 217-229, July 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.E. Gumbel. "The Maxima of the Mean of the Largest Value of the Range,". Annals of Mathematics and Statistics, 25, 1954.Google ScholarGoogle Scholar
  12. 12.O. Hastsson and A. Mayer. "Heuristic Search as Evidential Reasoning,". In Proceedings of the Fifth Workshop on Uncertainty in AI, August 1989.Google ScholarGoogle Scholar
  13. 13.K. A. Heiskanen. Tomography with Limited Data in Fan Beam Geometry. PhD thesis, UC/Berkeley, 1990.Google ScholarGoogle Scholar
  14. 14.S. Hiranandani, K. Kennedy, and C.-W. Tseng. "Cornprier Support for Machine-Independent Parallel Programming in Fortran D,". Technical Report TR91- 149, Rice University, March 1991.Google ScholarGoogle Scholar
  15. 15.S. F. Hummel, E. Schonberg, and L. E. Flynn. "Factoring: A Practical and Robust Method for Scheduling Parallel Loops,". Technical Report 74172, IBM Research Division, 1991.Google ScholarGoogle Scholar
  16. 16.C. Kruskal and A. Weiss. "Allocating Independent Subtasks on Parallel Processors,". IEEE Transactions on Software Engineering, SE-11, October 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.S. Lucco. "A Heuristic Linda Kernel for Hypercube Multiprocessors,". In Proceedings of the Second Conference on Hypercube Multiprocessors, September 1987.Google ScholarGoogle Scholar
  18. 18.S. Lucco. "Parallel Programming in a Virtual Object Space,". In Conference on Object-Oriented Programruing Systems, Languages, and Applications, October 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.S. Lucco and D. Anderson. "Tarmac: A Language Systern Substrate Based on Mobile Memory,". In International Conference on Distributed Computing Systems, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  20. 20.S. Lucco and K. Nichols. "A Performance Analysis of Three Parallel Programming Methodologies in the Context of MOS Timing Simulation,". In Digest of Papers: IEEE Compcon, pages 205-210, 1987.Google ScholarGoogle Scholar
  21. 21.S. Lucco and O. Sharp. "Delirium: An Embedding Coordination Language,". In Proceedings o/Super. computing '90, pages 515-524, November 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.S. Lucco and O. Sharp. "Parallel Programming With Coordination Structures,". In A CM Conference on the Principles of Programming Languages, :lanuary 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.C. Polychronopoulis and D. Kuck. "Guided Self- Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers,". iEEE Transactions on Computers, C-36(12), December 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.C. Polychronopoulos and U. Banerjee. "Speedup Bounds and Processor Allocation for Parallel Programs on a Multiprocessor,". Proceedings of the 1986 international Conference on Parallel Processing, pages 961-968, August 1986.Google ScholarGoogle Scholar
  25. 25.C. Polychronopoulos, M. Girkar, M. R. Haghighat, C. L. Lee, B. Leung, and D. Schouten. "Parafrase-2: An Environment for ParaUelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessots,". In Proceedings of the international Conference on Parallel Processing, volume II, pages 39-48, 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.L. Rudolph, M. Slivkin-Allalouf, and E. Upfal. "A Simple Load Balancing Scheme for Task Allocation in Parallel Machines,". In A CM Symposium on Parallel Algorithms and Architectures, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.V. Sarkar. "Determining Average Program Execution Times and their Variance,". In SIGPLAN Conference on Programming Language Design and Implementation, June 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.V. Sarkar and J. Hennessey. "Partitioning Parallel Programs for Macro Datafiow,'. In A CM Conference on Lisp and Functional Programming, pages 202-211, Cambridge, Mass., 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.P. Tang and P.-C. Yew. "Processor Self-Scheduling for Multiple Nested Parallel Loops,". Proceedings o/the 1986 International Conference on Parallel Processing, pages 528-535, August 1986.Google ScholarGoogle Scholar
  30. 30.S. Ulam and N. Metropolis. "The Monte Carlo Method,". Journal of the American Statistics Association, 44:335ff, 1949.Google ScholarGoogle ScholarCross RefCross Ref
  31. 31.M. E. Wolf and M. S. Lam. "A Data Locality Optimizing Algorithm,". In SIGPLAN Conference on Programming Language Design and Implementation, pages 30-44, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.M. E. Wolf and M. S. Lain. "A Loop Transformation Theory and an Algorithm to Maximize Parallelism,". IEEE Transactions on Parallel and Distributed Systems, 2(4):452-471, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.M. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, University of Illinois at Urbana- Champaign, October 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.M. Wolfe. "More Iteration Space Tiling,". In Proceedings of Supercomputing, pages 655-664, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A dynamic scheduling method for irregular parallel programs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
          July 1992
          352 pages
          ISBN:0897914759
          DOI:10.1145/143095

          Copyright © 1992 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 July 1992

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader