ABSTRACT
The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important performance limitation of such software-only protocols is that software latency associated with directory management ends up on the critical memory access path for read miss transactions. We propose five strategies that support efficient data transfers in hardware whereas directory management is handled at a slower pace in the background by software handlers. Simulations show that this approach can remove the directory-management latency from the memory access path. Whereas the directory is managed in software, the hardware mechanisms must access the memory state in order to enable data transfers at a high speed. Overall, our strategies reach between 60% and 86% of the hardware-based protocol performance.
- 1.A. Agarwal, D. Chaiken, K. Johnson, D. Kranz, J. Kubiatowicz, K. Kurihara, B-H. Lim, G. Maa, and D. Nussbaum, "The MIT Alewife machine: A large-scale distributed-memory multiprocessor", in: M. Dubois and S.S. Thakkar, eds., Scalable Shared Memory Multiprocessors (Kluwer Academic Publishers, Boston, MA 1990) 240-261Google Scholar
- 2.A. Agarwal, J. Kubiatowicz, D. Kranz, B-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors", IEEE Micro, 13(3):48-61, June 1993 Google ScholarDigital Library
- 3.M. Brorsson, E Dahlgren, H. Ndsson, and P. Stenstrtim, "The CacheMire Test Bench- A Flexible and Effective Approach for Simulation of Multlprocessors", In Proceedings c!f the 26th Annual Simulation Symposium, pages 41-49, March 1993.Google ScholarCross Ref
- 4.L.M. Censier and P. Feautner, "A New Solution to Coherence Problems in Multicache Systems", IEEE Transaction on Computers, C- 27(12):II12-1118, December 1978Google Scholar
- 5.D. Chalken, J. Kubiatowicz, and A. Agarwal, "LimitLESS Directories: A Scalable Cache Coherence Scheme", In Proceedings o{ ASP- LOS-IV, pages 224-234, April 1991. Google ScholarDigital Library
- 6.D. Chaiken and A. Agarwal, "Software-Extended Coherent Shared Memory: Performance and Cost", In Proceedings o{ the 21st hzternationat Symposium on Computer Architecture, pages 314-324, April 1994. Google ScholarDigital Library
- 7.A.L. Cox and R.J. Fowler, "Adaptive Cache Coherency for Detecting Migratory Shared Data", In Proceedings o{ the 20th International Symposium on Computer Architecture, pages 98-108, May 1993. Google ScholarDigital Library
- 8.K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors", In Proceedings of the 17th International Symposium on Computer Architecture, pages 15-26, May 1990. Google ScholarDigital Library
- 9.J. Hemlein, K. Gharachorloo, S. Dresser, and A. Gupta, "Integration of Message Passing and Shared Memory in the Stanford FLASH Multtprocessor", In Proceedings o{ ASPLOS-VI, pages 38-50, October, 1994. Google ScholarDigital Library
- 10.M. Heinrich, J. Kuskin, D. Ofelt, J. Heinlein, J. Baxter, J. R Singh, R. Simoni, K. Gharachorloo, D. Nakahira, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, "The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor", tn Proceedings o{ ASPLOS- VI, pages 274-285, October, 1994. Google ScholarDigital Library
- 11.D.S. Henry and C. E Joerg, "A Tightly-Coupled Processor-Network Interface", In Proceedings ~ ASPLOS-V, pages 111-122, October, 1992. Google ScholarDigital Library
- 12.M.D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood, "Cooperative Shared Memory: Software and Hardware for Scalable Multiprocessors", ACM Transactions on Computer Systems, 11(4):300-318, November 1993. Google ScholarDigital Library
- 13.J Kubiatowicz, D. Chaiken, and A. Agarwal, "Closing the Window of Vulnerability in Multiphase Memory Transactions", In Proceedmgs of ASPLOS-V, pages 274-284, October 1992. Google ScholarDigital Library
- 14.D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, "The DASH Prototype: Logic Overhead and Performance", IEEE Transactions on Parallel and Distributed Systems, 4(1):41-61, January 1993. Google ScholarDigital Library
- 15.S. K. Reinhardt, J. R. Larus, and D. A. Wood, "Tempest and Typhoon: User-Level Shared-Memory", In Proceedings of the 21st International Symposium on Computer Architecture, pages 325-336, April 1994. Google ScholarDigital Library
- 16.J-P. Singh, W-D. Weber, and A. Gupta. "SPLASH: Stanford parallel applications for shared-memory", Computer A,,chitecture News, 20(1):5-44, March 1992. Google ScholarDigital Library
- 17.P. Stenstrrm, M. Brorsson, and L. Sandberg, "An Adapuve Cache Coherence Protocol Optimlzed for Migratory Shm'ing", In Proceedings of the 20th International Symposium on Computer Architecture, pages 109-118, May 1993. Google ScholarDigital Library
- 18.T. von Eicken, D E. Culler, S. C. Goldstein, and K E. Schauser, "Active Messages: a Mechanism for Integrated Communication and Computation", In Proceedings o.f the 19th International Sympostum on Computer Archttecture, pages 256-266, May 1992. Google ScholarDigital Library
Index Terms
- Efficient strategies for software-only protocols in shared-memory multiprocessors
Recommendations
Efficient strategies for software-only protocols in shared-memory multiprocessors
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)The cost, complexity, and inflexibility of hardware-based directory protocols motivate us to study the performance implications of protocols that emulate directory management using software handlers executed on the compute processors. An important ...
An efficient cache design for scalable glueless shared-memory multiprocessors
CF '06: Proceedings of the 3rd conference on Computing frontiersTraditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the block is ...
Scalable directory architecture for distributed shared memory chip multiprocessors
Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main ...
Comments