ABSTRACT
By compiling ordinary scientific applications programs with a radical technique called trace scheduling, we are generating code for a parallel machine that will run these programs faster than an equivalent sequential machine—we expect 10 to 30 times faster.
Trace scheduling generates code for machines called Very Long Instruction Word architectures. In Very Long Instruction Word machines, many statically scheduled, tightly coupled, fine-grained operations execute in parallel within a single instruction stream. VLIWs are more parallel extensions of several current architectures.
These current architectures have never cracked a fundamental barrier. The speedup they get from parallelism is never more than a factor of 2 to 3. Not that we couldn't build more parallel machines of this type; but until trace scheduling we didn't know how to generate code for them. Trace scheduling finds sufficient parallelism in ordinary code to justify thinking about a highly parallel VLIW.
At Yale we are actually building one. Our machine, the ELI-512, has a horizontal instruction word of over 500 bits and will do 10 to 30 RISC-level operations per cycle [Patterson 82]. ELI stands for Enormously Longword Instructions; 512 is the size of the instruction word we hope to achieve. (The current design has a 1200-bit instruction word.)
Once it became clear that we could actually compile code for a VLIW machine, some new questions appeared, and answers are presented in this paper. How do we put enough tests in each cycle without making the machine too big? How do we put enough memory references in each cycle without making the machine too slow?
- 1.A. V. Aho and J. D. Ullman. Principles of Compiler Design. Addison-Wesley, 1977. Google ScholarDigital Library
- 2.Dasgupta, S. The Organization of Microprogram Stores. ACM Comp. Surv. 11(1):39-65, Mar. 1979. Google ScholarDigital Library
- 3.J. A. Fisher. An effective packing method for use with 2n-way jump instruction hardware. In 13th annual microprogramming workshop, pages 64-75. ACM Special Interest Group on Microprogramming, November 1980. Google ScholarDigital Library
- 4.J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers c-30(7):478-490, July 1981.Google Scholar
- 5.C. C. Foster and E. M. Riseman. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers 21(12):1411-1415, December 1972.Google ScholarDigital Library
- 6.T. R. Gross and J. L. Hennessy. Optimizing Delayed Branches. In 15th annual workshop on microprogramming, pages 114-120. ACM Special Interest Group on Microprogramming, October 1982. Google ScholarDigital Library
- 7.J. Hennessy, N. Jouppi, S. Przbyski, C. Rowen, T. Gross, F. Baskett, and J. Gill. MIPS: A Microprocessor Architecture. In 15th annual workshop on microprogramming, pages 17-22. ACM Special Interest Group on Microprogramming, October 1982. Google ScholarDigital Library
- 8.D. Jacobs, J. Prins, P. Siegel and K. Wilson. Monte carlo techniques in code optimization. In 15th annual workshop on microprogramming, pages 143-148. ACM Special Interest Group on Microprogramming, October 1982. Google ScholarDigital Library
- 9.Alexandru Nicolau and Joseph A. Fisher. Using an oracle to measure parallelism in single instruction stream programs. In 14th annual microprogramming workshop, pages 171-182. ACM Special Interest Group on Microprogramming, October 1981. Google ScholarDigital Library
- 10.D. A. Padua, D. J. Kuck, and D. H. Lawrie. High speed multiprocessors and compilation techniques. IEEE Transactions on Computers 29(9):763-776, September 1980.Google ScholarDigital Library
- 11.D. A. Patterson, K. Lew, and R. Tuck. Towards an efficient machine-independent language for microprogramming. In 12th annual microprogramming workshop, pages 22-35. ACM Special Interest Group on Microprogramming, 1979. Google ScholarDigital Library
- 12.D. A. Patterson and C. H. Sequin. A VLSI RISC. Computer 15(9):8-21, SEPT 1982.Google ScholarDigital Library
- 13.E. M. Riseman and C. C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers 21(12):1405-1411, December 1972.Google ScholarDigital Library
- 14.G. S. Tjaden and M. J. Flynn. Detection and parallel execution of independent instructions. IEEE Transactions on Computers 19(10):889-895, October 1970.Google ScholarDigital Library
- 15.Tokoro, M.; Takizuka, T.; Tamura E. and Yamaura, I. Towards an Efficient Machine-Independent Language for Microprogramming. In 11th Annual Microprogramming Workshop, pages 41-50. SIGMICRO, 1978. Google ScholarDigital Library
Index Terms
- Very Long Instruction Word architectures and the ELI-512
Recommendations
Very Long Instruction Word architectures and the ELI-512
By compiling ordinary scientific applications programs with a radical technique called trace scheduling, we are generating code for a parallel machine that will run these programs faster than an equivalent sequential machine—we expect 10 to 30 times ...
Measuring the Parallelism Available for Very Long Instruction Word Architectures
Long instruction word architectures, such as attached scientific processors and horizontally microcoded CPU's, are a popular means of obtaining code speedup via fine-grained parallelism. The falling cost of hardware holds out the hope of using these ...
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures
Very Long Instruction Word (VLIW) architectures can enhance performance by exploiting fine-grained instruction level parallelism. In this paper, we describe a compiler assisted multiple instruction word retry scheme for VLIW architectures. A read buffer ...
Comments