|
ABSTRACT
The physical register file is an important component of adynamically-scheduled processor. Increasing the amount of parallelismplaces increasing demands on the physical register file,calling for alternative file organization and management strategies.This paper considers the use of value locality to optimize theoperation of physical register files.We present empirical data showing that: (i) the value producedby an instruction is often the same as a value produced by anotherrecently executed instruction, resulting in multiple physical registerscontaining the same value, and (ii) the values 0 and 1 accountfor a considerable fraction of the values written to and read fromphysical registers. The paper then presents three schemes to exploitthe above observations.The first scheme extends a previously-proposed scheme to useonly a single physical register for each unique value. The secondscheme is a special case for the values 0 and 1. By restricting optimizationto these values, the second scheme eliminates many of thedrawbacks of the first scheme. The third scheme further improveson the second, resulting in an optimization that reduces physicalregister requirements with simple micro-architectural extensions.A performance evaluation of the three schemes is also presented.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, The anatomy of the register file in a multiscalar processor, Proceedings of the 27th annual international symposium on Microarchitecture, p.181-190, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192750]
|
| |
3
|
[3] D. C. Burger and T. M. Austin. The Simplescalar tool set, version 2.0. Technical Report CS-TR-1997-1342, University of Wisconsin, Madison, 1997.
|
 |
4
|
Robert P. Colwell , Robert P. Nix , John J. O'Donnell , David B. Papworth , Paul K. Rodman, A VLIW architecture for a trace scheduling compiler, Proceedings of the second international conference on Architectual support for programming languages and operating systems, p.180-192, October 1987, Palo Alto, California, United States
|
 |
5
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
| |
6
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
Stephen Jourdan , Ronny Ronen , Michael Bekerman , Bishara Shomar , Adi Yoaz, A novel renaming scheme to exploit value temporal locality through physical register reuse and unification, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.216-225, November 1998, Dallas, Texas, United States
|
| |
10
|
|
| |
11
|
|
| |
12
|
Teresa Monreal , Antonio González , Mateo Valero , José González , Victor Viñals, Delaying physical register allocation through virtual-physical registers, Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, p.186-192, November 16-18, 1999, Haifa, Israel
|
 |
13
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
14
|
|
| |
15
|
[15] S. Rixner, W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. Register organization for media processing. In Proc. of the 6th Intl. Symp. on High-Performance Computer Architecture, pages 375-386, 1999.
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
CITED BY 8
|
|
|
|
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, Selective writeback: exploiting transient values for energy-efficiency and performance, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
Oguz Ergin , Deniz Balkan , Kanad Ghose , Dmitry Ponomarev, Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.304-315, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|