Abstract
Memory related anti- and output-dependences can limit the potential parallelism in ordinary programs. In a distributed memory system, improper partition and distribution of data involved in memory related dependences may incur unnecessary communications and load imbalance. In this extended abstract, we present an overview of our work on using array privatization to enhance inherent parallelism and reduce communications.
- [Bal91] V. Balasundaram. Translating control parallelism to data parallelism. In Proc. 5th SIAM Conf. on Parallel Processing for Scientific Computing, 1991. Google ScholarDigital Library
- [BCFH89] M. Burke, R. Cytron, J. Ferrante, and W. Hsieh. Automatic generation of nested, fork-join parallelism. Journal of Supercomputing , pages 71-88, 1989.Google ScholarCross Ref
- [Che89] Ding-Kai Chen. MAXPAR: An execution driven simulator for studying parallel systems. MS thesis, Univ. of Illinois at Urbana-Champaign, Center for Supercomp. R&D, October 1989. CSRD Report 917.Google Scholar
- [CK88] D. Callahan and K. Kennedy. Compiling programs for distributed-memory multiprocessors. Journal of Supercomputing, 2:151- 169, October 1988.Google ScholarCross Ref
- [CKPK90] George Cybenko, Lyle Kipp, Lynn Pointer, and David Kuck. Supercomputer performance evaluation and the perfect benchmarks. In Proc. of ICS, Amsterdam, Netherlands, March 1990. Google ScholarDigital Library
- [EHLP91] R. Eigenmann, J. Hoeflinger, Z. Li, and D. Padua. Experience in the automatic parallelization of four Perfect-Benchmark programs. In Proc. 4-th Workshop on Programming Languages and Compilers for Parallel Computing. Pitman/MIT Press, August 1991. Google ScholarDigital Library
- [Fea88] P. Feautrier. Array expansion. In Proc. 1988 ACM Int'l Conf. on Supercomputing, July 1988. Google ScholarDigital Library
- [GB90] M. Gupta and P. Banerjee. Automatic data partitioning on distributed memory multiprocessors. Technical Report CRHC-90-14, Center for Reliable and High-Performance Computing, Univeristy of Illinois, October 1990.Google Scholar
- [HKT91] S. Hiranandani, K. Kennedy, and Ch.-W. Tseng. Compiler support for machine-independent parallel programming in Fortran D. Technical Report Rice COMP TR91-149, Department of Computer Science, Rice University, January 1991.Google Scholar
- [Kum88] M. Kumar. Measuring parallelism in computation-intensive science/engineering applications. IEEE Transactions on Computers , 37(9):5-40, 1988. Google ScholarDigital Library
- [Pad89] David A. Padua. The Delta Program Manipulation system -- Preliminary design. CSRD Report 808, University of Illinois at Urbana-Champaign, Center for Supercomp. R&D, June 1989.Google Scholar
- [PP92] Paul Petersen and David Padua. Machine-Independent Evaluation of Parallelizing Compilers. In Advanced Compilation Techniques for Novel Architectures, January 1992.Google Scholar
- [PW86] D. Padua and M. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184-1201, December 1986. Google ScholarDigital Library
- [RP89] A. Rogers and K. Pingali. Process decomposition through locality of reference. In Proc. the SIGPLAN '89 Conference on Program Language Design and Implementation, June 1989. Google ScholarDigital Library
- [ZBG88] H. Zima, H.-J. Bast, and M. Gerndt. Superb: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing, 6:1-18, 1988.Google ScholarCross Ref
Index Terms
- Array privatization for shared and distributed memory machines (extended abstract)
Recommendations
Optimizing irregular shared-memory applications for distributed-memory systems
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingIn prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...
(R) Compiler Support for Privatization on Distributed - Memory Machines
ICPP '96: Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3Abstract: The practice of using temporary scalar or array variables to store the results of common subexpressions presents several challenges to a parallelizing compiler. Not only does dependence analysis and, as a result, parallelization suffer; but ...
Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines
Distributed-memory message-passing machines deliver scalable performance but are difficult to program. Shared-memory machines, on the other hand, are easier to program but obtaining scalable performance with large number of processors is difficult. ...
Comments