|
ABSTRACT
In this paper high-level language (HLL) variables that are alive in a whole HLL function, across multiple scheduling units, are termed as global values. Due to their long live ranges and, hence, large impact on the schedule, the global values require different compiler optimizations than local values, which span across only one scheduling unit. The instruction scheduler for a clustered ILP processor, which is responsible for cluster assignment of operations and variables, faces a difficult problem of assigning global values to clusters. Our study shows that trivial assignments (e.g. mapping all global values into one cluster) may result in a severe cycle count overhead relative to the unicluster of up to 26.3% for a four cluster VLIW machine. This paper presents three advanced algorithms for assigning global values to clusters based on multi-pass scheduling and affinity of variables. Furthermore, we measure performance of these algorithms on optimized multimedia C applications and assess quality of our algorithms by comparing them to a practical higher performance bound derived from a vast random search. Our algorithms reduce the execution time overhead of the best simple algorithm round-robin from 10.5% to 5.9% for the two cluster VLIW machine and from 17.3% to 14.12% for the four cluster VLIW machine.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Rixner, W.J. Dally, et al. "Register organization for media processing", In Proceedings of 26th International Symposium on High-Performance Computer Architecture, Orlando, January 1999.
|
| |
2
|
R. Ho, K. Mai, and M. Horowitz, "The Future of Wires", In Proceedings of the IEEE, pp. 490--504, April 2001.
|
| |
3
|
Texas Instrument TMS320C64xx DSP Generation. http://www.ti.com.
|
| |
4
|
M. Levy, "ManArray devours DSP code", Microprocessor report, October 2001.
|
 |
5
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
6
|
P. Faraboschi, G. Desoli, J.A. Fisher, "Clustered instruction-level parallel processors", HPL-98-204, HP Laboratory, Cambridge, December 1998.
|
| |
7
|
S. Sudharsanan, P. Sriram, et al., "Image And Video Processing Using MAJC 5200", In Proceedings of International Conference on Image Processing, Canada, September 2000.
|
| |
8
|
Sun MAJC architecture tutorial, http://www.sun.com/.
|
| |
9
|
S. Sudharsanan, "MAJC-5200: a high performance microprocessor for multimedia computing", White paper, http://www.sun.com.
|
| |
10
|
|
| |
11
|
J. Janssen, "Compiler Strategies for Transport Triggered Architecture", PhD thesis, TU Deflt, The Netherlands, 2001.
|
| |
12
|
S. Roos, H. Corporaal, et al., "Clustering on the Move", In Proceedings of 4th International Conference on Massively Parallel Computing Systems, Ischia Italy, April 2002.
|
| |
13
|
J. Hoogerbrugge, L. Augusteijn, "Instruction scheduling for TriMedia", The Journal of Instruction-Level Parallelism, February 1999.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
Peter Mattson , William J. Dally , Scott Rixner , Ujval J. Kapasi , John D. Owens, Communication scheduling, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.82-92, November 2000, Cambridge, Massachusetts, United States
|
| |
20
|
|
| |
21
|
|
| |
22
|
J. H. Moreno , M. Moudgill , K. Ebcioğlu , E. Altman , C. B. Hall , R. Miranda , S.-K. Chen , A. Polyak, Simulation/evaluation environment for a VLIW processor architecture, IBM Journal of Research and Development, v.41 n.3, p.287-302, May 1997
|
| |
23
|
M. Moudgill, "Implementing an Experimental VLIW compiler", IEEE Technical Committee on Computer Architecture Newsletter, pp. 39--40, June 1997.
|
 |
24
|
|
| |
25
|
|
| |
26
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
27
|
J.A. Fisher, "Trace Scheduling: a Technique for Global Microcode Compaction", IEEE Transactions on Computers, vol. C-30, pp. 478--490, July 1981.
|
| |
28
|
|
| |
29
|
|
 |
30
|
Walter Lee , Rajeev Barua , Matthew Frank , Devabhaktuni Srikrishna , Jonathan Babb , Vivek Sarkar , Saman Amarasinghe, Space-time scheduling of instruction-level parallelism on a raw machine, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.46-57, October 02-07, 1998, San Jose, California, United States
|
| |
31
|
|
| |
32
|
J. Sánchez, A. González, "Clustered Modulo Scheduling in a VLIW Architecture with Distributed Cache", Journal on Instruction Level Parallelism (JILP), Volume 3, October 2001.
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|