ABSTRACT
We consider three simple extensions to directory-based cache coherence protocols in shared-memory multiprocessors. These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a migratory sharing optimization, and a competitive-update mechanism. Since they target different components of the read and write penalties, they can be combined effectively.Detailed architectural simulations using five benchmarks show substantial combined performance gains obtained at a modest additional hardware cost. Prefetching in combination with competitive-update is the best combination under release consistency in systems with sufficient network bandwidth. By contrast, prefetching plus the migratory sharing optimization is advantageous under sequential consistency and/or in systems with limited network bandwidth.
- 1.Brorsson, M., Dahlgren, E, Nilsson, H., and Stenstr6m, P. The CacheMire Test Bench- A Flexible and Effective Approach for Simulation of Multiprocessors. In Proc. of the 26th Ann. Simulation Syrup., pp. 41-49, 1993.]]Google Scholar
- 2.Cox, A.L. and Fowler, R.J. Adaptive Cache Coherency for Detecting Migratory Shared Data. In Proc. of the 20th Annual Int. Syrup. on Computer Architecture, pp.98-108, May 1993.]] Google ScholarDigital Library
- 3.Dahlgren, E, Dubois, M., and Stenstr6m, E Fixed and Adaptive Sequential Prefetching in Shared-Memory Multiprocessors. In Proc. of 1993 Int. Conf. on Parallel Processing, Vol. I, pp. 56-63, 1993.]] Google ScholarDigital Library
- 4.Dahlgren, E and Stenstr6m, E Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared- Memory Multiprocessors. Tech. Rep., Dept. of Comp. Eng., Lund University, April 1993. Presented at the Third Workshop on Scalable Shared-Memory Multiprocessors, May 1993.]]Google Scholar
- 5.Dahlgren, E, Dubois, M., and Stenstr6m, E Performance Gains and Cost Trade-off for Cache Protocol Extensions. Tech. Rep. Dept. of Comp. Eng., Lund University, Feb. 1994.]]Google Scholar
- 6.Dubois, M. and Scheurich, C. Memory Access Dependencies in Shared Memory Multiprocessors. in IEEE Trans. on Software Engineering, 16(6), pp. 660-674, June 1990.]] Google ScholarDigital Library
- 7.Gharachorloo, K., Gupta, A., Hennessy, J. Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors. In Proc. of ASPLOS IV, April 1991.]] Google ScholarDigital Library
- 8.Gupta, A., Hennessy, J., Gharachorloo, K., Mowry, T., and Weber, W.-D. Comparative Evaluation of Latency Reducing and Tolerating Techniques. In Proc. of the 18th Ann. Int. Symp. on Computer Architecture, pp.254-263, May 1991.]] Google ScholarDigital Library
- 9.Mowry, T. and Gupta, A. Tolerating Latency through Software-Controlled Prefetching in Scalable Shared-Memory Multiprocessors. In Journal of Parallel and Distributed Computing, 2(4), June 1991.]] Google ScholarDigital Library
- 10.Nilsson, H., Stenstr6m, P., and Dubois, M. Implementation and Evaluation of Update-Based Cache Protocols Under Relaxed Memory Consistency Models. Tech. Rep. Dept. of Comp. Eng., Lund University, July 1993.]]Google Scholar
- 11.Singh, J.P., Weber, W.-D., and Gupta, A. SPLASH: Stanford Parallel Applications for Shared-Memory, In Computer Architecture News, 20(1):5-44, March 1992.]] Google ScholarDigital Library
- 12.Stenstrtim, P., Brorsson, M., and Sandberg, L. An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. In Proc. of the 20th Ann. Int. Symp. on Computer Architecture, pp. 109-118, May 1993.]] Google ScholarDigital Library
Index Terms
- Combined performance gains of simple cache protocol extensions
Recommendations
Combined performance gains of simple cache protocol extensions
Special Issue: Proceedings of the 21st annual international symposium on Computer architecture (ISCA '94)We consider three simple extensions to directory-based cache coherence protocols in shared-memory multiprocessors. These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a ...
Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors
We evaluate three extensions to directory-based cache coherence protocols in shared-memory multiprocessors. These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a migratory ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10Practical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Comments