|
ABSTRACT
Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular, register renaming a large number of instructions per cycle is difficult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the register file into a global file and several local files, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
Aus92a
|
|
 |
Bur96a
|
Doug Burger , James R. Goodman , Alain Kägi, Memory bandwidth limitations of future microprocessors, Proceedings of the 23rd annual international symposium on Computer architecture, p.78-89, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
Con95a
|
Thomas M. Conte , Kishore N. Menezes , Patrick M. Mills , Burzin A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proceedings of the 22nd annual international symposium on Computer architecture, p.333-344, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
Dit82a
|
|
 |
Fra92a
|
|
 |
Fra92b
|
|
| |
Fra93a
|
|
 |
Fra94a
|
|
| |
Fra95a
|
|
| |
Hao96a
|
Eric Hao , Po-Yung Chang , Marius Evers , Yale N. Patt, Increasing the instruction fetch rate via block-structured instruction set architectures, Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, p.191-200, December 02-04, 1996, Paris, France
|
 |
Hil84a
|
|
| |
Hwu87a
|
W. W. Hwu and Y. N. Patt, "Design Choices for the HPSm Microprocessor Chip," in Proc. 20th Annual Hawaii International Conference on System Sciences, Kona, HI, January 1987.
|
| |
IBM90a
|
IBM, "Special Issue on the IBM RISC System/6000 Processor," IBM Journal of Research and Development, January 1990.
|
 |
Lam92a
|
|
| |
Mel88a
|
S. W. Melvin , M. C. Shebanow , Y. N. Patt, Hardware support for large atomic units in dynamically scheduled machines, Proceedings of the 21st annual workshop on Microprogramming and microarchitecture, p.60-63, November 28-December 02, 1988, San Diego, California, United States
|
| |
Mit97a
|
Tulika Mitra. "Performance Evaluation of Improved Superscalar Issue Mechanisms," in M.E. Project Report, Dept. of Computer Science, Indian Institute of Science, January 1997.
|
| |
Pal96a
|
S. Palacharla. N. Jouppi, and J. E. Smith, "Quantifying the Complexity of Superscalar Processors," Univ. of Wisconsin-Madison Technical Report, vol. CS-T&96- 1328, November 1996, (Available at http:l/www.cs.wisc.edultrs.html; a version to appear in ISCA'97).
|
 |
Pat85a
|
Y. N. Patt , W. M. Hwu , M. Shebanow, HPS, a new microarchitecture: rationale and introduction, Proceedings of the 18th annual workshop on Microprogramming, p.103-108, December 03-06, 1985, Pacific Grove, California, United States
|
 |
Pat85b
|
Y. N. Patt , S. W. Melvin , W. M. Hwu , M. C. Shebanow, Critical issues regarding HPS, a high performance microarchitecture, Proceedings of the 18th annual workshop on Microprogramming, p.109-116, December 03-06, 1985, Pacific Grove, California, United States
|
| |
Rot96a
|
|
 |
Rus78a
|
|
 |
Smi84a
|
|
| |
Smo95a
|
|
 |
Spr94a
|
|
| |
Tom67a
|
R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, January 1967.
|
| |
Uht92a
|
|
 |
Wal91a
|
|
| |
Wei95a
|
|
 |
Yeh93b
|
|
 |
Yeh93a
|
|
CITED BY 25
|
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|