Evaluating the locality benefits of active messages

Authors:
Ellen Spertus

Microsoft Research, 1 Microsoft Way, Redmond, WA and Laboratory for Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts

Microsoft Research, 1 Microsoft Way, Redmond, WA and Laboratory for Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts
View Profile

,
William J. Dally

Laboratory for Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts

Laboratory for Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts
View Profile

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programmingAugust 1995Pages 189–198https://doi.org/10.1145/209936.209956

Published:01 August 1995Publication History

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 189–198

ABSTRACT

A major challenge in fine-grained computing is achieving locality without excessive scheduling overhead. We built two J-Machine implementations of a fine-grained programming model, the Berkeley Threaded Abstract Machine. One implementation takes an Active Messages approach, maintaining a scheduling hierarchy in software in order to improve data cache performance. Another approach relies on the J-Machine's message queues and fast task switch, lowering the control costs at the expense of data locality. Our analysis measures the costs and benefits of each approach, for a variety of programs and cache configurations. The Active Messages implementation is strongest when miss penalties are high and for the finest-grained programs. The hardware-buffered implementation is strongest in direct-mapped caches, where it achieves substantially better instruction cache performance.

References

AHN88.Arvind, S. K. Heller, and R. S. Nikhil. Programming generality and parallel computers. In Proceedings of the 4th International Symposium on Biological and Artificial intelligence Systems, pages 255-286, Trento, Italy, September 1988. ESCOM (Leider).Google Scholar
CGSvE93.D.E. Culler, S. C. Goldstein, K. E. Schauser, and T. von Eicken. TAM m A Compiler Controlled Threaded Abstract Machine. Journal of Parallel and Distributed Computing, 18:347-370, July 1993. Google ScholarDigital Library
CSS+91.D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. Also available as Technical Report UCB/CSD 91/591, CS Div., University of California at Berkeley. Google ScholarDigital Library
D+87.William J. Dally et al. Architecture of a message-driven processor. In Proceedings of the 14th International Symposium on Computer Architecture, pages 189-205. IEEE, June 1987. Google ScholarDigital Library
DFK+92.William J. Dally, J. A. Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, and Gregory A. Fyler. The Message-Driven Processor: A multicomputer processing node with efficient mechanisms. IEEE Micro, 12(2):23-39, April 1992. Google ScholarDigital Library
MB91.Jeffrey C. Mogul and Anita Borg. The effect of context switches on cache performance. in Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, April 1991. Google ScholarDigital Library
ML95.David Metz and Ben Lee. Analyzing the benefits of a separate processor to handle messages for fine-grain multithreading. Technical Report TRECE95.03, Department of Electrical and Computer Engineering, Oregon State University, 1995. Submitted to the Seventh IEEE Symposium on Parallel and Distributed Processing. Google ScholarDigital Library
Nik91.R.S. Nikhil. Id (version 90.1) reference manual. CSG Memo 284-2, MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA, July 1991.Google Scholar
Nik93.Rishiyur S. Nikhil. A multithreaded implementation of Id using P-RISC graphs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, Portland, OR, 1993. Google ScholarDigital Library
Sch91.Klaus Erik Schauser. Compiling dataflow into threads. Master's thesis, Computer Science Division, University of California at Berkeley, 1991.Google Scholar
SGS+93.Ellen Spertus, Seth Copen Goldstein, Klaus Erik Schauser, Thorsten von Eicken, David E. Culler, and William J. Dally. Evaluation of mechanisms for fine-grained parallel programs in the J-Machine and the CM-5. In Proceedings of the International Symposium on Computer Architecture, pages 302-313, May 1993. Google ScholarDigital Library
Spe92.Ellen Spertus. Execution of Dataflow Programs on General-Purpose Hardware. Master's thesis, Department of EECS, Massachusetts Institute of Technology, 545 Tech. Square, Cambridge, MA, August 1992. To be expanded and released as Technical Report 1380.Google Scholar
TCS92.K.R. Traub, D. E. Culler, and K. E. Schauser. Global analysis for partitioning non-strict programs into sequential threads. A CM LISP Pointers, 5(1):324-334, 1992. Proceedings of the 1992 ACM Conference on LISP and Functional Programming. Google ScholarDigital Library
Thi92.Thinking Machines Corporation, Cambridge, Massachusetts. The Connection Machine CM-5 Technical Summary, January 1992.Google Scholar
vECGS92.Thorsten yon Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active Messages: a Mechanism for integrated Communication and Computation. In Proc. of the 19th Int'l Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarDigital Library
WHJ+95.Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johnson, M. Frans Kaashoek, and William E. Weihl. Optimistic active messages: A mechanism for scheduling communication with computation. In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, July 1995. Google ScholarDigital Library

Index Terms

Evaluating the locality benefits of active messages
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management

Recommendations

Evaluating the locality benefits of active messages

A major challenge in fine-grained computing is achieving locality without excessive scheduling overhead. We built two J-Machine implementations of a fine-grained programming model, the Berkeley Threaded Abstract Machine. One implementation takes an ...
Read More
Exploiting reuse locality on inclusive shared last-level caches
Special Issue on High-Performance Embedded Architectures and Compilers

Optimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is ...
Read More
Exploiting spatial locality in data caches using spatial footprints
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)

Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
August 1995
234 pages
ISBN:0897917006
DOI:10.1145/209936
Chairmen:
Jeanne Ferrante
Univ. of California, San Diego
,
David Padua
Univ. of Illinois at Urbana-Champaign, Urbana
,
Editor:
Richard L. Wexelblat
IRS IS:AO, Washington, DC
ACM SIGPLAN Notices Volume 30, Issue 8
Aug. 1995
226 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/209937
Editors:
Richard L. Wexelblat
IRS IS:AO, Washington, DC
,
Jeanne Ferrante
Univ. of California, San Diego
,
David Padua
Univ. of Illinois at Urbana-Champaign, Urbana
Issue’s Table of Contents
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate230of1,014submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 266
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating the locality benefits of active messages

PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluating the locality benefits of active messages

Exploiting reuse locality on inclusive shared last-level caches

Exploiting spatial locality in data caches using spatial footprints