skip to main content
article
Free Access

Access normalization: loop restructuring for NUMA computers

Published:01 November 1993Publication History
Skip Abstract Section

Abstract

In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. Additionally, when a number of remote accesses must be made, it is usually more efficient to use block transfers of data rather than to use many small messages. To run well on such machines, software must exploit these features. We believe it is too onerous for a programmer to do this by hand, so we have been exploring the use of restructuring compiler technology for this purpose. In this article, we start with a language like HPF-Fortran with user-specified data distribution and develop a systematic loop transformation strategy called access normalization that restructures loop nests to exploit locality and block transfers. We demonstrate the power of our techniques using routines from the BLAS (Basic Linear Algebra Subprograms) library. An important feature of our approach is that we model loop transformation using invertible matrices and integer lattice theory.

References

  1. AGARWAL, A. 1991. Limits on interconnect~on network performance. IEEE Trans. Parall. Dtstrib. Syst. 2, 4, 398-412 Google ScholarGoogle Scholar
  2. ALLEN, R., AND KENNEDY, K. 1987. Automatic translation of FORTRAN programs to vector form. ACM Trans. Program. Lang. Syst. 9, 4, 491-542. Google ScholarGoogle Scholar
  3. ALLEN, F., COCKE, J., AND KENNEDY, K. 1981. Reductton of Operator Strength. Prentice-Hall, Englewood Cliffs, N J, 79-101.Google ScholarGoogle Scholar
  4. ANCOURT, C., AND IRIGOIN, F. 1991. Scanning polyhedra with DO loops in The 3rd ACM Symposium on Princzples and Practice o/Parallel Programming. ACM, New York, 39 50. Google ScholarGoogle Scholar
  5. BALASUNDARAM, V., Fox, G., KENNEDY, K., AND KREMER, U. 1991. An iteractive environment for data partitioning and distribution. In Proceedings of the 5lh Distributed Memory Computing Conference. IEEE, New York.Google ScholarGoogle Scholar
  6. BANEPdEE, U. 1990. Unimodular transformations of double loops. In Proceedings of the Workshop on Advances in Languages and Compilers for Parallel Processing. 192 219Google ScholarGoogle Scholar
  7. BANEPdEE, U. 1988. Dependence Analysis for Supercomputzng. Kluwer Academic, New York. Google ScholarGoogle Scholar
  8. BBN ADVANCED COMPUTERS INC. 1989. Buttez~y GPIO00 Sw~tch Tutortal. BBN, Cambridge, MassGoogle ScholarGoogle Scholar
  9. CALLAHAN D., AND KENNEDY, K. 1988. Compiling programs for distributed memory multlprocessors, d. Supercomput. 2, 2 (Oct.).Google ScholarGoogle Scholar
  10. COLEMAN, T., AND VAN LOAN C. 1988. Handbook for Matrzx Computations SIAM, Phi}adelphia.Google ScholarGoogle Scholar
  11. GANNON, D., JALBY, W., AND GALHVAN, K. 1988. Strategies for cache and local memory management by global program transformations J. Parall. Dlstrzb. Comput. 5, 587-616. Google ScholarGoogle Scholar
  12. GERNDT, H.M. 1989. Automatic parallebzatmn for distributed-memory multlprocessing systems. Ph.D. thesis, Bonn Univ., Bonn, Germany.Google ScholarGoogle Scholar
  13. HIRANANDAN}, S., KENNEDY, K., AND TSENG, C 1991. Compiler optimizations for FORTRAN-D on MIMD distributed-memory machines. Tech. Rep. TR91-156, Rice Umv.Google ScholarGoogle Scholar
  14. HUDAK, D., AND ABRAHAM, S. 1991 Compiler techniques for data parititioning of sequentially iterated parallel loops. In Proceedings of the ACM International Conference on Supercomputing. ACM, New York. Google ScholarGoogle Scholar
  15. IRIGOIN, F., AND TRIOLET, R. 1988. Supernode partitioning. In Proceedings~ o/the 15th Annual ACM Symposlum on Principles of Programmtng Languages. ACM, New York Google ScholarGoogle Scholar
  16. I~NDALL SQUARE RESEARCU CORPORATION 1991. Parallel Programming Manual. Waltham, Mass.Google ScholarGoogle Scholar
  17. KNOBE, K., LUKAS, J., AND STEELE, G. 1990. Data optimization: Allocation of arrays to reduce communication on SIMD machines. J. Parall Distr~b. Comput. 8, (Feb.), 102 118 Google ScholarGoogle Scholar
  18. KOELBEL AND MEHRO?RA, P. 1991. Compiling global namespace parallel loops for distributed execution. IEEE Trans. Parall. Dzstrib. Svst. 2, (Oct.). Google ScholarGoogle Scholar
  19. KUMAR, K. G., KULKARNI, D., AND BASU, A. 1991 Generalized unimodular loop transformtions for distributed memory multiprocessors, Tech Rep FG-TR-014, Center for Development of Advanced Computing, Bangalore, India.Google ScholarGoogle Scholar
  20. LAMPORT. L. 1994. The parallel execution of do loops. Commun. ACM 27, 2 (Feb.) 83-93. Google ScholarGoogle Scholar
  21. LI, J., AND CHEN, M. 1989. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays Tech. Rep., Yale Univ.Google ScholarGoogle Scholar
  22. LI, W., AND PINGALI, K. 1992a. Access normahzation: Loop restructuring for NUMA compilers. Tech. Rep. 92-1278, Department of Computer Science, Cornell Univ., Ithaca, N Y. Google ScholarGoogle Scholar
  23. Li, W. AND PINGALi, K. 1992b. A singular loop transformation framework based on non-singular matrices. Tech. Rep. 92-1294, Dept. of Computer Science, Corne}l Univ., Ithaca, N.Y. Google ScholarGoogle Scholar
  24. LU, L. 1991. A unifed framework for systematic loop transformations. In The 3rd ACM SIGPLAN Symposium on Prlnctples and Practice of Parallel programmzng. ACM, New York 28-38. Google ScholarGoogle Scholar
  25. MIDKIFF, S. P., AND D. A. PADUA. 1987. Compiler algorithms for synchronization. IEEE Trans. Comput. C-36, (Dec.), 1485 1495. Google ScholarGoogle Scholar
  26. MIRCHANDANEY, R., SALTZ, J., SMITH, R., NICOL, D., AND CROWLEY, K. 1988. Principles of runtime support for parallel processors. In Proceedings of the 2nd International Conference on Supercomputlng. Google ScholarGoogle Scholar
  27. PADUA, n., AND WOLFE, M. 1980. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec.), 1184 1201. Google ScholarGoogle Scholar
  28. PORTERFIELD, A. 1989. Software methods for improvement of cache performance on supercomputer applications. Ph.D. thesis, Rice Univ. Google ScholarGoogle Scholar
  29. RAMANUJAM, J., AND SADAYAPPAN, P. 1991. Compile-time techniques for data distribution in distributed memory machines. IEEE Trans. Parall. Distrib. Syst. 2 (Oct.). Google ScholarGoogle Scholar
  30. ROGERS, A. 1990. Compiling for locality of reference. Ph.D. thesis, Cornell Univ., Ithaca, N.Y. Google ScholarGoogle Scholar
  31. ROGERS, h., AND PINGALI, K. 1989. Process decomposition through locality of reference. In Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementatton. ACM, New York. Google ScholarGoogle Scholar
  32. SCHREIBER, R., AND DONGARRA, J. 1990. Automatic blocking of nested loops. Tech. Rep. 90.38, NASA RIACS.Google ScholarGoogle Scholar
  33. TSENG, P. 1989. A parallelizing compiler for distributed memory parallel computers. Ph.D. thesis, Carnegie-Mellon Univ., Pittsburgh, Penn. Google ScholarGoogle Scholar
  34. WHITFIELD, n., AND SOFFA, M.L. 1991. Automatic generation of globa} optimizers. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation. SIGPLAN Not. (June). Google ScholarGoogle Scholar
  35. WOLF M., AND LAM, M. 1991a. A data locality optimizing algorithm. In Proceedings ACM SIGPLAN '91 Conference on Programming Language Design and Implementation. ACM, New York, 30-44. Google ScholarGoogle Scholar
  36. WOLF, M., AND LAM, M. 1991b. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parall. Distrib. Syst. 13~ 4 (Oct.). Google ScholarGoogle Scholar
  37. WOLFE, M. 1989. Optimizing Supercompilers for Supercomputers. Pitman Publishing, London. Google ScholarGoogle Scholar
  38. ZIMA, H., AND C~ArMAN, B. 1990. Supercomptlers for Parallel and Vector Computers. ACM Press Frontier Series, New York. Google ScholarGoogle Scholar

Index Terms

  1. Access normalization: loop restructuring for NUMA computers

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader