Loop optimization for horizontal microcoded machines

Authors:
François Bodin

IRISA, Campus de Beaulieu, 35042 Rennes-Cedex, France

IRISA, Campus de Beaulieu, 35042 Rennes-Cedex, France
View Profile

,
François Charot

IRISA, Campus de Beaulieu, 35042 Rennes-Cedex, France

IRISA, Campus de Beaulieu, 35042 Rennes-Cedex, France
View Profile

ICS '90: Proceedings of the 4th international conference on SupercomputingJune 1990Pages 164–176https://doi.org/10.1145/77726.255153

Published:01 June 1990Publication History

ICS '90: Proceedings of the 4th international conference on Supercomputing

Pages 164–176

ABSTRACT

Long Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism and it will have to take into account the fine level characteristics of the architecture. This paper aims at describing a microcode compiler developed at IRISA for such architectures. After a brief overview of the compilation process, we focus on loop scheduling techniques. The software pipelining algorithm is firstly described. Then a new unrolling-based optimization algorithm is introduced and compared to the classical software pipelining algorithm. This algorithm differs from the traditional loop unrolling algorithm because the unrolling of the loop is only used to find a cyclic scheduling of the loop, then this scheduling allows a software pipelining to be constructed.

References

1.A. V. Aho and J. D. Ullman. Principles of Compiler Design. Addison-Wesley, 1977. Google ScholarDigital Library
2.A. Aiken and A. Nicolau. A development for horizontal microcode programs. MICRO 19, pages 23- 31, 1986. Google ScholarDigital Library
3.A. Aiken and A. Nicolau. Optimal loop parallelization. Proceedings of ~he SIGPLAN '88, pages 308- 317, 1988. Google ScholarDigital Library
4.F. Bodin, F. Charot, and C. Wagner. Overview of a high-performance programmable pipeline architecture. A CM Supercomputin# 89 (Crete), pages 398-409, 1989. Google ScholarDigital Library
5.S. Dasgupta and J. Tartar. The identification of maximal parallelism in straight-line microprograms. IEEE Transactions on Computers, 25(10):086-991, 1976.Google Scholar
6.S. Davidson, D, Landskov, B. D. Shriver, and P. W. Mallett. Local microcode compaction techniques. Computing Survey, 12(3):261-294, 1980. Google ScholarDigital Library
7.D.J. Dewit. A Machine independent approach to the Production of Horizontal Microcode. PhD thesis, University of Michigan, 1976. Google ScholarDigital Library
8.C. Eisenbeis. Optimisation automatique de programmes sur array-processors. Th~se d'universit~ de Pierre et Marie Curie Paris 6, J uin 1986.Google Scholar
9.C. Eisenbeis. Optimization of horizontal microcode generation for loop structures. A CM Supercomputing 88, pages 453-465, 1988. Google ScholarDigital Library
10.C. Eisenbeis, W. 3alby, and A. Lichnewsky. Squeezing more cpu performance out of a eray-2 by vector block scheduling. Florida Supercomputing 88, 1988. Google ScholarDigital Library
11.J.A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478-490, 1981.Google ScholarDigital Library
12.M. R. Garey and D.S. Johnson. Computer8 and Intractability, A Guide to the Theory of NP- Completeness. W.H. Freeman and company, 1979. Google ScholarDigital Library
13.T. Gross and M. S. Lain. Compilation for a highperformance systolic array. SIGPLAN'86 Symposium on Compiler Cons~ruc~ios, pages 2?-38, 1986. Google ScholarDigital Library
14.R.W. Hockney and C.R. Jcsshope. Parallel Computers. Adam Hilger Ltd, Bristol, 1981.Google Scholar
15.M. Lain. A Systolic Array Optimizing Compiler. PhD thesis, Carnegie Mellon University, May 1987.Google Scholar
16.A. Nicolau. A Fine.Grain Parallelizinfl Compiler, ber 1986. Google ScholarDigital Library
17.D.A. Padua, D.J. Kuck, R.H. Kuhn, B. Leasure, and M. Wolfe. Dependence graphs and compiler optimisations. A CM Symposium on Principles of Programming Languages, pages 207-218, 198 I. Google ScholarDigital Library
18.3.H. Patel and E.S. Davidson. Improving the throughput by insertion of delays. Proc 3rd Annual Syrup. on Computer Architecture, pages 159- 164, 1976. Google ScholarDigital Library
19.B.R. Rau and C.D. Glaescr. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. IEEE, pages 183-198, 1981. Google ScholarDigital Library
20.R.F. Touzeau. A fortran compiler for the fps-164 scientific computer. Proc. of the A CM SIGPLAN '8J Syrup. on Compiler Construction, pages 48-57, 1984. Google ScholarDigital Library

Index Terms

Recommendations

Loop optimization for horizontal microcoded machines
Special Issue: Proceedings of the 4th international conference on Supercomputing

Long Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism ...
Read More
Timing optimization via nest-loop pipelining considering code size

Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, ...
Read More
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '90: Proceedings of the 4th international conference on Supercomputing
June 1990
492 pages
ISBN:0897913698
DOI:10.1145/77726
Chairmen:
Ahmed Sameh
Univ. of Illinois
,
Henk van der Vorst
Delft Univ. of Technology and CWI, The Netherlands
ACM SIGARCH Computer Architecture News Volume 18, Issue 3b
Special Issue: Proceedings of the 4th international conference on Supercomputing
Sept. 1990
489 pages
ISSN:0163-5964
DOI:10.1145/255129
Chairmen:
Ahmed Sameh
Univ. of Illinois at Urbana-Champaign, Urbana
,
Henk van der Vorst
Delft Univ. of Technology and CWI, The Netherlands
Issue’s Table of Contents
Copyright © 1990 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1990
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 341
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Loop optimization for horizontal microcoded machines

ICS '90: Proceedings of the 4th international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Loop optimization for horizontal microcoded machines

Timing optimization via nest-loop pipelining considering code size

Outer-loop vectorization: revisited for short SIMD architectures