|
ABSTRACT
Application-specific instruction-set extensions (custom instructions) help embedded processors achieve higher performance. Most custom instructions offering significant performance benefit require multiple input operands. Unfortunately, RISC-style embedded processors are designed to support at most two input operands per instruction. This data bandwidth problem is due to the limited number of read ports in the register file per instruction as well as the fixed-length instruction encoding. We propose to overcome this restriction by exploiting the data forwarding feature present in processor pipelines. With minimal modifications to the pipeline and the instruction encoding along with cooperation from the compiler, we can supply up to two additional input operands per custom instruction. Experimental results indicate that our approach achieves 87--100% of the ideal performance limit for standard benchmark programs. Additionally, our scheme saves 25% energy on an average by avoiding unnecessary accesses to the register file.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
Jason Cong , Yiping Fan , Guoling Han , Ashok Jagannathan , Glenn Reinman , Zhiru Zhang, Instruction set extension with shadow registers for configurable processors, Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays, February 20-22, 2005, Monterey, California, USA
[doi> 10.1145/1046192.1046206]
|
| |
5
|
|
| |
6
|
Altera Corp. Nios processor reference handbook.
|
| |
7
|
|
| |
8
|
M. R. Guthaus , J. S. Ringenberg , D. Ernst , T. M. Austin , T. Mudge , R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, p.3-14, December 02-02, 2001
[doi> 10.1109/WWC.2001.15]
|
| |
9
|
Xilinx Inc. Microblaze soft processor core.
|
| |
10
|
R. Jayaseelan, H. Liu, and T. Mitra. Exploiting forwarding to improve the data bandwidth of instruction-set extensions. Technical Report TRB5/06, School of Computing, National University of Singapore, 2006.
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
P. Shivakumar and N. P. Jouppi. CACTI 3.0: An integrated cache timing, power and area model. Technical Report 2001/2, Compaq Computer Corporation, 2001.
|
 |
15
|
|
 |
16
|
|
CITED BY 2
|
|
Kubilay Atasu , Robert G. Dimond , Oskar Mencer , Wayne Luk , Can Özturan , Günhan Dündar, Optimizing instruction-set extensible processors under data bandwidth constraints, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
|
|
Kingshuk Karuri , Anupam Chattopadhyay , Manuel Hohenauer , Rainer Leupers , Gerd Ascheid , Heinrich Meyr, Increasing data-bandwidth to instruction-set extensions through register clustering, Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design, November 05-08, 2007, San Jose, California
|
|