ACM Home Page
Please provide us with feedback. Feedback
Cluster assignment of global values for clustered VLIW processors
Full text PdfPdf (331 KB)
Source International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems table of contents
San Jose, California, USA
SESSION: Compilation table of contents
Pages: 32 - 40  
Year of Publication: 2003
ISBN:1-58113-676-5
Authors
Andrei Terechko  Philips Research, Eindhoven, The Netherlands
Erwan Le Thénaff  Philips Research, Eindhoven, The Netherlands
Henk Corporaal  Technical University Eindhoven, Eindhoven, The Netherlands
Sponsors
ACM: Association for Computing Machinery
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 27,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/951710.951717
What is a DOI?

ABSTRACT

In this paper high-level language (HLL) variables that are alive in a whole HLL function, across multiple scheduling units, are termed as global values. Due to their long live ranges and, hence, large impact on the schedule, the global values require different compiler optimizations than local values, which span across only one scheduling unit. The instruction scheduler for a clustered ILP processor, which is responsible for cluster assignment of operations and variables, faces a difficult problem of assigning global values to clusters. Our study shows that trivial assignments (e.g. mapping all global values into one cluster) may result in a severe cycle count overhead relative to the unicluster of up to 26.3% for a four cluster VLIW machine. This paper presents three advanced algorithms for assigning global values to clusters based on multi-pass scheduling and affinity of variables. Furthermore, we measure performance of these algorithms on optimized multimedia C applications and assess quality of our algorithms by comparing them to a practical higher performance bound derived from a vast random search. Our algorithms reduce the execution time overhead of the best simple algorithm round-robin from 10.5% to 5.9% for the two cluster VLIW machine and from 17.3% to 14.12% for the four cluster VLIW machine.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Rixner, W.J. Dally, et al. "Register organization for media processing", In Proceedings of 26th International Symposium on High-Performance Computer Architecture, Orlando, January 1999.
 
2
R. Ho, K. Mai, and M. Horowitz, "The Future of Wires", In Proceedings of the IEEE, pp. 490--504, April 2001.
 
3
Texas Instrument TMS320C64xx DSP Generation. http://www.ti.com.
 
4
M. Levy, "ManArray devours DSP code", Microprocessor report, October 2001.
5
 
6
P. Faraboschi, G. Desoli, J.A. Fisher, "Clustered instruction-level parallel processors", HPL-98-204, HP Laboratory, Cambridge, December 1998.
 
7
S. Sudharsanan, P. Sriram, et al., "Image And Video Processing Using MAJC 5200", In Proceedings of International Conference on Image Processing, Canada, September 2000.
 
8
Sun MAJC architecture tutorial, http://www.sun.com/.
 
9
S. Sudharsanan, "MAJC-5200: a high performance microprocessor for multimedia computing", White paper, http://www.sun.com.
 
10
 
11
J. Janssen, "Compiler Strategies for Transport Triggered Architecture", PhD thesis, TU Deflt, The Netherlands, 2001.
 
12
S. Roos, H. Corporaal, et al., "Clustering on the Move", In Proceedings of 4th International Conference on Massively Parallel Computing Systems, Ischia Italy, April 2002.
 
13
J. Hoogerbrugge, L. Augusteijn, "Instruction scheduling for TriMedia", The Journal of Instruction-Level Parallelism, February 1999.
 
14
 
15
 
16
 
17
18
19
 
20
 
21
 
22
 
23
M. Moudgill, "Implementing an Experimental VLIW compiler", IEEE Technical Committee on Computer Architecture Newsletter, pp. 39--40, June 1997.
24
 
25
 
26
 
27
J.A. Fisher, "Trace Scheduling: a Technique for Global Microcode Compaction", IEEE Transactions on Computers, vol. C-30, pp. 478--490, July 1981.
 
28
 
29
30
 
31
 
32
J. Sánchez, A. González, "Clustered Modulo Scheduling in a VLIW Architecture with Distributed Cache", Journal on Instruction Level Parallelism (JILP), Volume 3, October 2001.


Collaborative Colleagues:
Andrei Terechko: colleagues
Erwan Le Thénaff: colleagues
Henk Corporaal: colleagues

Peer to Peer - Readers of this Article have also read: