Efficient strategies for software-only protocols in shared-memory multiprocessors

Authors:
Håkan Grahn

Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 LUND, Sweden

Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 LUND, Sweden
View Profile

,
Per Stenström

Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 LUND, Sweden

Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 LUND, Sweden
View Profile

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architectureJuly 1995Pages 38–47https://doi.org/10.1145/223982.225958

Published:01 May 1995Publication History

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

Pages 38–47

ABSTRACT

The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important performance limitation of such software-only protocols is that software latency associated with directory management ends up on the critical memory access path for read miss transactions. We propose five strategies that support efficient data transfers in hardware whereas directory management is handled at a slower pace in the background by software handlers. Simulations show that this approach can remove the directory-management latency from the memory access path. Whereas the directory is managed in software, the hardware mechanisms must access the memory state in order to enable data transfers at a high speed. Overall, our strategies reach between 60% and 86% of the hardware-based protocol performance.

References

1.A. Agarwal, D. Chaiken, K. Johnson, D. Kranz, J. Kubiatowicz, K. Kurihara, B-H. Lim, G. Maa, and D. Nussbaum, "The MIT Alewife machine: A large-scale distributed-memory multiprocessor", in: M. Dubois and S.S. Thakkar, eds., Scalable Shared Memory Multiprocessors (Kluwer Academic Publishers, Boston, MA 1990) 240-261Google Scholar
2.A. Agarwal, J. Kubiatowicz, D. Kranz, B-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors", IEEE Micro, 13(3):48-61, June 1993 Google ScholarDigital Library
3.M. Brorsson, E Dahlgren, H. Ndsson, and P. Stenstrtim, "The CacheMire Test Bench- A Flexible and Effective Approach for Simulation of Multlprocessors", In Proceedings c!f the 26th Annual Simulation Symposium, pages 41-49, March 1993.Google ScholarCross Ref
4.L.M. Censier and P. Feautner, "A New Solution to Coherence Problems in Multicache Systems", IEEE Transaction on Computers, C- 27(12):II12-1118, December 1978Google Scholar
5.D. Chalken, J. Kubiatowicz, and A. Agarwal, "LimitLESS Directories: A Scalable Cache Coherence Scheme", In Proceedings o{ ASP- LOS-IV, pages 224-234, April 1991. Google ScholarDigital Library
6.D. Chaiken and A. Agarwal, "Software-Extended Coherent Shared Memory: Performance and Cost", In Proceedings o{ the 21st hzternationat Symposium on Computer Architecture, pages 314-324, April 1994. Google ScholarDigital Library
7.A.L. Cox and R.J. Fowler, "Adaptive Cache Coherency for Detecting Migratory Shared Data", In Proceedings o{ the 20th International Symposium on Computer Architecture, pages 98-108, May 1993. Google ScholarDigital Library
8.K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors", In Proceedings of the 17th International Symposium on Computer Architecture, pages 15-26, May 1990. Google ScholarDigital Library
9.J. Hemlein, K. Gharachorloo, S. Dresser, and A. Gupta, "Integration of Message Passing and Shared Memory in the Stanford FLASH Multtprocessor", In Proceedings o{ ASPLOS-VI, pages 38-50, October, 1994. Google ScholarDigital Library
10.M. Heinrich, J. Kuskin, D. Ofelt, J. Heinlein, J. Baxter, J. R Singh, R. Simoni, K. Gharachorloo, D. Nakahira, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, "The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor", tn Proceedings o{ ASPLOS- VI, pages 274-285, October, 1994. Google ScholarDigital Library
11.D.S. Henry and C. E Joerg, "A Tightly-Coupled Processor-Network Interface", In Proceedings ~ ASPLOS-V, pages 111-122, October, 1992. Google ScholarDigital Library
12.M.D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood, "Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors", ACM Transactions on Computer Systems, 11(4):300-318, November 1993. Google ScholarDigital Library
13.J Kubiatowicz, D. Chaiken, and A. Agarwal, "Closing the Window of Vulnerability in Multiphase Memory Transactions", In Proceedmgs of ASPLOS-V, pages 274-284, October 1992. Google ScholarDigital Library
14.D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, "The DASH Prototype: Logic Overhead and Performance", IEEE Transactions on Parallel and Distributed Systems, 4(1):41-61, January 1993. Google ScholarDigital Library
15.S. K. Reinhardt, J. R. Larus, and D. A. Wood, "Tempest and Typhoon: User-Level Shared-Memory", In Proceedings of the 21st International Symposium on Computer Architecture, pages 325-336, April 1994. Google ScholarDigital Library
16.J-P. Singh, W-D. Weber, and A. Gupta. "SPLASH: Stanford parallel applications for shared-memory", Computer A,,chitecture News, 20(1):5-44, March 1992. Google ScholarDigital Library
17.P. Stenstrrm, M. Brorsson, and L. Sandberg, "An Adapuve Cache Coherence Protocol Optimlzed for Migratory Shm'ing", In Proceedings of the 20th International Symposium on Computer Architecture, pages 109-118, May 1993. Google ScholarDigital Library
18.T. von Eicken, D E. Culler, S. C. Goldstein, and K E. Schauser, "Active Messages: a Mechanism for Integrated Communication and Computation", In Proceedings o.f the 19th International Sympostum on Computer Archttecture, pages 256-266, May 1992. Google ScholarDigital Library

Index Terms

Efficient strategies for software-only protocols in shared-memory multiprocessors
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Efficient strategies for software-only protocols in shared-memory multiprocessors
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)

The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important ...
Read More
An efficient cache design for scalable glueless shared-memory multiprocessors
CF '06: Proceedings of the 3rd conference on Computing frontiers

Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is ...
Read More
Scalable directory architecture for distributed shared memory chip multiprocessors

Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture
July 1995
426 pages
ISBN:0897916980
DOI:10.1145/223982
Chairman:
David A. Patterson
Univ. of California, Berkeley
ACM SIGARCH Computer Architecture News Volume 23, Issue 2
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
May 1995
412 pages
ISSN:0163-5964
DOI:10.1145/225830
Chairman:
David A. Patterson
Univ. of California, Berkeley
Issue’s Table of Contents
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient strategies for software-only protocols in shared-memory multiprocessors

An efficient cache design for scalable glueless shared-memory multiprocessors

Scalable directory architecture for distributed shared memory chip multiprocessors