| Automatic compiler techniques for thread coarsening for multithreaded architectures |
| Full text |
Pdf
(1.09 MB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 14th international conference on Supercomputing
table of contents
Santa Fe, New Mexico, United States
Pages: 306 - 315
Year of Publication: 2000
ISBN:1-58113-270-0
|
|
Authors
|
|
Gary M. Zoppetti
|
Department of Computer and Information Sciences, University of Delaware, Newark DE
|
|
Gagan Agrawal
|
Department of Computer and Information Sciences, University of Delaware, Newark DE
|
|
Lori Pollock
|
Department of Computer and Information Sciences, University of Delaware, Newark DE
|
|
Jose Nelson Amaral
|
Department of Electrical and Computer Engineering, University of Delaware, Newark DE
|
|
Xinan Tang
|
Chameleon Systems Inc., Sunnyvale, CA and Department of Electrical and Computer Engineering, University of Delaware, Newark DE
|
|
Guang Gao
|
Department of Electrical and Computer Engineering, University of Delaware, Newark DE
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 33, Citation Count: 1
|
|
|
ABSTRACT
Multithreaded architectures are emerging as an important class of parallel machines. By allowing fast context switching between threads on the same processor, these systems hide communication and synchronization latencies and allow scalable parallelism for dynamic and irregular applications. Thread partitioning is the most important task in compiling high-level languages for multithreaded architectures. Non-preemptive multithreaded architectures, which can be built from off-the-shelf components, require that if a thread issues a potentially remote memory request, then any statement that is dependent upon this request must be in a separate thread.When performing thread partitioning on codes that use pointer-based recursive data structures, it is often difficult to extract accurate dependence information. As a result, threads of unnecessarily small granularity get generated, which, because of thread switching costs, leads to increased execution time. In this paper, we present three techniques that lead to improved extraction and representation of dependence information in the presence of structured control flow, references through fields of structures, and pointer-based data structures. The benefit of these techniques is the generation of coarser-grained threads and, therefore, decreased execution time. Our experiments were performed using the EARTH-C compiler and the EARTH multithreaded architecture model emulated on both a cluster of Pentium PCs and a distributed memory multiprocessor. On our set of 6 pointer-based programs, these techniques reduced the static number of threads by 38%. Reductions in execution times ranged from 16% to 45% on the four programs we measured runtime performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Anant Agarwal , Beng-Hong Lim , David Kranz , John Kubiatowicz, APRIL: a processor architecture for multiprocessing, Proceedings of the 17th annual international symposium on Computer Architecture, p.104-114, May 28-31, 1990, Seattle, Washington, United States
|
 |
2
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
 |
3
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
Maryam Emami , Rakesh Ghiya , Laurie J. Hendren, Context-sensitive interprocedural points-to analysis in the presence of function pointers, Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, p.242-256, June 20-24, 1994, Orlando, Florida, United States
|
 |
8
|
Rakesh Ghiya , Laurie J. Hendren, Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C, Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.1-15, January 21-24, 1996, St. Petersburg Beach, Florida, United States
[doi> 10.1145/237721.237724]
|
| |
9
|
|
| |
10
|
L. J. Hendren , Xinan Tang , Yingchun Zhu , G. R. Gao , Xun Xue , Haiying Cai , P. Ouellet, Compiling C for the EARTH Multithreaded Architecture, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, p.12, October 20-23, 1996
|
| |
11
|
Laurie J. Hendren , Xinan Tang , Yingchun Zhu , Shereen Ghobrial , Guang R. Gao , Xun Xue , Haiying Cai , Pierre Ouellet, Compiling C for the EARTH multithreaded architecture, International Journal of Parallel Programming, v.25 n.4, p.305-338, Aug. 1997
[doi> 10.1007/BF02699905]
|
| |
12
|
Herbert H. J. Hum , Olivier Maquelin , Kevin B. Theobald , Xinmin Tian , Xinan Tang , Guang R. Gao , Phil Cupryk , Nasser Elmasri , Laurie J. Hendren , Alberto Jimenez , Shoba Krishnan , Andres Marquez , Shamir Merali , Shashank S. Nemawarkar , Prakash Panangaden , Xun Xue , Yingchun Zhu, A design study of the EARTH multiprocessor, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.59-68, June 27-29, 1995, Limassol, Cyprus
|
| |
13
|
Robert A. Iannucci. A datafiow/von Neumann hybrid architecture. MIT/LCS/TR-418, MIT for ACM Computing Surveys, Cambridge, July 1988. PhD thesis, May 1988.
|
 |
14
|
|
| |
15
|
Jack L. Lo , Susan J. Eggers , Henry M. Levy , Sujay S. Parekh , Dean M. Tullsen, Tuning compiler optimizations for simultaneous multithreading, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.114-124, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
16
|
Rishiyur Nikhil. Cid: A parallel C for distributed memory machines. Technical report, Cambridge Research Laboratory, Digital Equipment Corporation, 1994.
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
Klaus Eric Schauser, David E. Culler, and Thorsten yon Eiken. Compiler-controlled multithreading for lenient parallel languages. No. UCB/CSD 91/640, ACM Computing Surveys, at Berkeley, 1991.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
Suan Hsi Yong , Susan Horwitz , Thomas Reps, Pointer analysis for programs with structures and casting, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.91-103, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|