skip to main content
10.1145/209936.209956acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article
Free Access

Evaluating the locality benefits of active messages

Authors Info & Claims
Published:01 August 1995Publication History

ABSTRACT

A major challenge in fine-grained computing is achieving locality without excessive scheduling overhead. We built two J-Machine implementations of a fine-grained programming model, the Berkeley Threaded Abstract Machine. One implementation takes an Active Messages approach, maintaining a scheduling hierarchy in software in order to improve data cache performance. Another approach relies on the J-Machine's message queues and fast task switch, lowering the control costs at the expense of data locality. Our analysis measures the costs and benefits of each approach, for a variety of programs and cache configurations. The Active Messages implementation is strongest when miss penalties are high and for the finest-grained programs. The hardware-buffered implementation is strongest in direct-mapped caches, where it achieves substantially better instruction cache performance.

References

  1. AHN88.Arvind, S. K. Heller, and R. S. Nikhil. Programming generality and parallel computers. In Proceedings of the 4th International Symposium on Biological and Artificial intelligence Systems, pages 255-286, Trento, Italy, September 1988. ESCOM (Leider).Google ScholarGoogle Scholar
  2. CGSvE93.D.E. Culler, S. C. Goldstein, K. E. Schauser, and T. von Eicken. TAM m A Compiler Controlled Threaded Abstract Machine. Journal of Parallel and Distributed Computing, 18:347-370, July 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. CSS+91.D. Culler, A. Sah, K. Schauser, T. von Eicken, and J. Wawrzynek. Fine-grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proc. of 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa-Clara, CA, April 1991. Also available as Technical Report UCB/CSD 91/591, CS Div., University of California at Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D+87.William J. Dally et al. Architecture of a message-driven processor. In Proceedings of the 14th International Symposium on Computer Architecture, pages 189-205. IEEE, June 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. DFK+92.William J. Dally, J. A. Stuart Fiske, John S. Keen, Richard A. Lethin, Michael D. Noakes, Peter R. Nuth, Roy E. Davison, and Gregory A. Fyler. The Message-Driven Processor: A multicomputer processing node with efficient mechanisms. IEEE Micro, 12(2):23-39, April 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. MB91.Jeffrey C. Mogul and Anita Borg. The effect of context switches on cache performance. in Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. ML95.David Metz and Ben Lee. Analyzing the benefits of a separate processor to handle messages for fine-grain multithreading. Technical Report TRECE95.03, Department of Electrical and Computer Engineering, Oregon State University, 1995. Submitted to the Seventh IEEE Symposium on Parallel and Distributed Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nik91.R.S. Nikhil. Id (version 90.1) reference manual. CSG Memo 284-2, MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA, July 1991.Google ScholarGoogle Scholar
  9. Nik93.Rishiyur S. Nikhil. A multithreaded implementation of Id using P-RISC graphs. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, Portland, OR, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sch91.Klaus Erik Schauser. Compiling dataflow into threads. Master's thesis, Computer Science Division, University of California at Berkeley, 1991.Google ScholarGoogle Scholar
  11. SGS+93.Ellen Spertus, Seth Copen Goldstein, Klaus Erik Schauser, Thorsten von Eicken, David E. Culler, and William J. Dally. Evaluation of mechanisms for fine-grained parallel programs in the J-Machine and the CM-5. In Proceedings of the International Symposium on Computer Architecture, pages 302-313, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Spe92.Ellen Spertus. Execution of Dataflow Programs on General-Purpose Hardware. Master's thesis, Department of EECS, Massachusetts Institute of Technology, 545 Tech. Square, Cambridge, MA, August 1992. To be expanded and released as Technical Report 1380.Google ScholarGoogle Scholar
  13. TCS92.K.R. Traub, D. E. Culler, and K. E. Schauser. Global analysis for partitioning non-strict programs into sequential threads. A CM LISP Pointers, 5(1):324-334, 1992. Proceedings of the 1992 ACM Conference on LISP and Functional Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thi92.Thinking Machines Corporation, Cambridge, Massachusetts. The Connection Machine CM-5 Technical Summary, January 1992.Google ScholarGoogle Scholar
  15. vECGS92.Thorsten yon Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active Messages: a Mechanism for integrated Communication and Computation. In Proc. of the 19th Int'l Symposium on Computer Architecture, Gold Coast, Australia, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. WHJ+95.Deborah A. Wallach, Wilson C. Hsieh, Kirk L. Johnson, M. Frans Kaashoek, and William E. Weihl. Optimistic active messages: A mechanism for scheduling communication with computation. In Proceedings of the Fifth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluating the locality benefits of active messages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
      August 1995
      234 pages
      ISBN:0897917006
      DOI:10.1145/209936

      Copyright © 1995 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 August 1995

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate230of1,014submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader