ABSTRACT
Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings.
This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.
- F. Aono and M. Kimura. The Azusa 16-way Itanium server. IEEE Micro, 20(5):54--60, September/October 2000. Google ScholarDigital Library
- F. Briggs, S. Chittor, and K. Cheng. Micro-architecture techniques in the intel e8870 scalable memory controller. In Proceedings of the 3rd Workshop on Memory Performance Issues, in conjunction with ISCA-31, pages 30--36, June 2004. Google ScholarDigital Library
- A. Bright, M. Ellavsky, A. Gara, R. Haring, G. Kopcsay, R. Lembach, J. Marcella, M. Ohmacht, and V. Salapura. Creating the BlueGene/L supercomputer from low power SoC ASICs. In Internationcal Solid State Circuits Conference. IEEE, February 2005.Google ScholarCross Ref
- S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In Proceedings of the 2004 Symposium on Performance Evaluation of Computer Telecommunication Systems, July 2004.Google Scholar
- R. Dennard, F. Gaensslen, H.-N. Yu, V. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, pages 256--268, 1974.Google ScholarCross Ref
- S. Ekman, F. Dahlgren, and P. Stenstrom. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design, pages 243--246, August 2002. Google ScholarDigital Library
- S. Gochman, A. Mendelson, A. Naveh, and E. Rotem. Introduction to Intel Core Duo processor architecture. Intel Technology Journal, May 2006.Google ScholarCross Ref
- R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits, 31(9):1277--1284, September 1996.Google ScholarCross Ref
- M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A novel SIMD architecture for the CELL heterogeneous chip-multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.Google ScholarCross Ref
- IBM. IBM PowerPC 440 product brief. http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_440_Embedded_Core, July 2006.Google Scholar
- J. P. Singh, W-D. Weber, and A. Gupta. Splash: Stanford parallel applications for shared memory. Computer Architecture News, pages 5--44, March 1992. Google ScholarDigital Library
- C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The AMD opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March/April 2003. Google ScholarDigital Library
- A. Moshovos. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 234--245, June 2005. Google ScholarDigital Library
- A. Moshovos, G. Memik, B. Falsafi, and A. N. Choudhary. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In HPCA-7, pages 85--96, 2001. Google ScholarDigital Library
- A.-T. Nguyen, M. Michael, A. Sharma, and J. Torrellas. The augmint multiprocessor simulation toolkit for intel x86 architectures. In Proceedings of 1996 International Conference on Computer Design, October 1996. Google ScholarDigital Library
- V. Salapura et al. Power and performance optimization at the system level. In Proceedings of Computing Frontiers 2005, Ischia, Italy, May 2005. Google ScholarDigital Library
- C. Saldanha and M. Lipasti. Power efficient cache coherence. In Proceedings of the Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001.Google Scholar
- V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. Strenski, and P. Emma. Optimizing pipelines for power and performance. In ACM/IEEE, editor, Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 333--344, Istanbul, Turkey, November 2002. Google ScholarDigital Library
- S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, June 1995. Google ScholarDigital Library
Recommendations
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
ISLPED '08: Proceedings of the 2008 international symposium on Low Power Electronics & DesignWe present a snoop filtering mechanism for multicore microprocessors that implement coherent caches using the MESI protocol. The relatively small filter structure at each core maintains coarse-grain sharing information about regions within a page to ...
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers IIIMulti-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip symmetric multiprocessors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The ...
Exploring the Architecture of a Stream Register-Based Snoop Filter
Proceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 6590Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip symmetric multiprocessors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The ...
Comments