| Operation tables for scheduling in the presence of incomplete bypassing |
| Full text |
Pdf
(152 KB)
|
Source
|
International Conference on Hardware Software Codesign
archive
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
table of contents
Stockholm, Sweden
SESSION: Software and hardware techniques for performance optimisation of embedded applications
table of contents
Pages: 194 - 199
Year of Publication: 2004
ISBN:1-58113- 937-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 30, Citation Count: 4
|
|
|
ABSTRACT
Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing has significant impact on cycle time, area, and power consumption of the processor. Due to the strict constraints on performance, cost and power consumption in embedded processors, architects need to evaluate and implement incomplete register bypassing mechanisms. However traditional data hazard detection and/or avoidance techniques used in retargetable schedulers break down in the presence of incomplete bypassing. In this paper, we present the concept of Operation Tables, which can be used to detect data hazards, even in the presence of incomplete bypassing. Furthermore our technique integrates the detection of both data, as well as resource hazards, and can be easily employed in a compiler to generate better schedules. Our experimental results on the popular Intel XScale embedded processor platform show that even with a simple intra-basic block scheduling technique, we achieve upto 20% performance improvement over fully optimized GCC generated code on embedded applications from the MiBench suite.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Intel xscale microarchitecture programmers reference manual.
|
| |
2
|
|
| |
3
|
Pritpal S. Ahuja , Douglas W. Clark , Anne Rogers, The performance impact of incomplete bypassing in processor pipelines, Proceedings of the 28th annual international symposium on Microarchitecture, p.36-45, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
4
|
E. Bloch. The engineering design of the stretch computer. In Proc. of Eastern Joint Computer Conference, pages 48--59, 1959.
|
 |
5
|
Marcio Buss , Rodolfo Azevedo , Paulo Centoducatte , Guido Araujo, Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures, Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, November 16-17, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502217.502241]
|
| |
6
|
E. S. Davidson. The design and control of pipelined function generators. Int. IEEE Conf. on Systems Networks and Computers, pages 19--21, 1971.
|
| |
7
|
K. Fan, N. Clark, M. Chu, K. V. Manjunath, R. Ravindran, M. Smelyanskiy, and S. Mahlke. Systematic register bypass customization for application-specific processors. In Proc. of IEEE Intl. Conf. on ASSAP, 2003.
|
| |
8
|
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In IEEE Workshop in workload characterization, 2001.
|
 |
9
|
Ashok Halambi , Peter Grun , Vijay Ganesh , Asheesh Khare , Nikil Dutt , Alex Nicolau, EXPRESSION: a language for architecture exploration through compiler/simulator retargetability, Proceedings of the conference on Design, automation and test in Europe, p.100-es, January 1999, Munich, Germany
[doi> 10.1145/307418.307549]
|
| |
10
|
A. Halambi, A. Shrivastava, N. Dutt, and A. Nicolau. A customizable compiler framework for embedded systems. In SCOPES, 2001.
|
| |
11
|
|
| |
12
|
P. Geoffrey Lowney , Stefan M. Freudenberger , Thomas J. Karzes , W. D. Lichtenstein , Robert P. Nix , John S. O'Donnell , John Ruttenberg, The multiflow trace scheduling compiler, The Journal of Supercomputing, v.7 n.1-2, p.51-142, May 1993
[doi> 10.1007/BF01205182]
|
| |
13
|
|
| |
14
|
The Trimaran Consortium. The Trimaran Compiler Infrastructure for Instruction Level Parallelism.
|
CITED BY 4
|
|
|
|
|
|
|
|
|
Florian Brandner , Dietmar Ebner , Andreas Krall, Compiler generation from structural architecture descriptions, Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, September 30-October 03, 2007, Salzburg, Austria
|
|