ABSTRACT
Long Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism and it will have to take into account the fine level characteristics of the architecture. This paper aims at describing a microcode compiler developed at IRISA for such architectures. After a brief overview of the compilation process, we focus on loop scheduling techniques. The software pipelining algorithm is firstly described. Then a new unrolling-based optimization algorithm is introduced and compared to the classical software pipelining algorithm. This algorithm differs from the traditional loop unrolling algorithm because the unrolling of the loop is only used to find a cyclic scheduling of the loop, then this scheduling allows a software pipelining to be constructed.
- 1.A. V. Aho and J. D. Ullman. Principles of Compiler Design. Addison-Wesley, 1977. Google ScholarDigital Library
- 2.A. Aiken and A. Nicolau. A development for horizontal microcode programs. MICRO 19, pages 23- 31, 1986. Google ScholarDigital Library
- 3.A. Aiken and A. Nicolau. Optimal loop parallelization. Proceedings of ~he SIGPLAN '88, pages 308- 317, 1988. Google ScholarDigital Library
- 4.F. Bodin, F. Charot, and C. Wagner. Overview of a high-performance programmable pipeline architecture. A CM Supercomputin# 89 (Crete), pages 398-409, 1989. Google ScholarDigital Library
- 5.S. Dasgupta and J. Tartar. The identification of maximal parallelism in straight-line microprograms. IEEE Transactions on Computers, 25(10):086-991, 1976.Google Scholar
- 6.S. Davidson, D, Landskov, B. D. Shriver, and P. W. Mallett. Local microcode compaction techniques. Computing Survey, 12(3):261-294, 1980. Google ScholarDigital Library
- 7.D.J. Dewit. A Machine independent approach to the Production of Horizontal Microcode. PhD thesis, University of Michigan, 1976. Google ScholarDigital Library
- 8.C. Eisenbeis. Optimisation automatique de programmes sur array-processors. Th~se d'universit~ de Pierre et Marie Curie Paris 6, J uin 1986.Google Scholar
- 9.C. Eisenbeis. Optimization of horizontal microcode generation for loop structures. A CM Supercomputing 88, pages 453-465, 1988. Google ScholarDigital Library
- 10.C. Eisenbeis, W. 3alby, and A. Lichnewsky. Squeezing more cpu performance out of a eray-2 by vector block scheduling. Florida Supercomputing 88, 1988. Google ScholarDigital Library
- 11.J.A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478-490, 1981.Google ScholarDigital Library
- 12.M. R. Garey and D.S. Johnson. Computer8 and Intractability, A Guide to the Theory of NP- Completeness. W.H. Freeman and company, 1979. Google ScholarDigital Library
- 13.T. Gross and M. S. Lain. Compilation for a highperformance systolic array. SIGPLAN'86 Symposium on Compiler Cons~ruc~ios, pages 2?-38, 1986. Google ScholarDigital Library
- 14.R.W. Hockney and C.R. Jcsshope. Parallel Computers. Adam Hilger Ltd, Bristol, 1981.Google Scholar
- 15.M. Lain. A Systolic Array Optimizing Compiler. PhD thesis, Carnegie Mellon University, May 1987.Google Scholar
- 16.A. Nicolau. A Fine.Grain Parallelizinfl Compiler, ber 1986. Google ScholarDigital Library
- 17.D.A. Padua, D.J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe. Dependence graphs and compiler optimisations. A CM Symposium on Principles of Programming Languages, pages 207-218, 198 I. Google ScholarDigital Library
- 18.3.H. Patel and E.S. Davidson. Improving the throughput by insertion of delays. Proc 3rd Annual Syrup. on Computer Architecture, pages 159- 164, 1976. Google ScholarDigital Library
- 19.B.R. Rau and C.D. Glaescr. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. IEEE, pages 183-198, 1981. Google ScholarDigital Library
- 20.R.F. Touzeau. A fortran compiler for the fps-164 scientific computer. Proc. of the A CM SIGPLAN '8J Syrup. on Compiler Construction, pages 48-57, 1984. Google ScholarDigital Library
Index Terms
- Loop optimization for horizontal microcoded machines
Recommendations
Loop optimization for horizontal microcoded machines
Special Issue: Proceedings of the 4th international conference on SupercomputingLong Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism ...
Timing optimization via nest-loop pipelining considering code size
Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, ...
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesVectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
Comments