skip to main content
10.1145/1327171.1327174acmconferencesArticle/Chapter ViewAbstractPublication PagesmedeaConference Proceedingsconference-collections
research-article

Improving the accuracy of snoop filtering using stream registers

Authors Info & Claims
Published:16 September 2007Publication History

ABSTRACT

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings.

This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.

References

  1. F. Aono and M. Kimura. The Azusa 16-way Itanium server. IEEE Micro, 20(5):54--60, September/October 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Briggs, S. Chittor, and K. Cheng. Micro-architecture techniques in the intel e8870 scalable memory controller. In Proceedings of the 3rd Workshop on Memory Performance Issues, in conjunction with ISCA-31, pages 30--36, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bright, M. Ellavsky, A. Gara, R. Haring, G. Kopcsay, R. Lembach, J. Marcella, M. Ohmacht, and V. Salapura. Creating the BlueGene/L supercomputer from low power SoC ASICs. In Internationcal Solid State Circuits Conference. IEEE, February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In Proceedings of the 2004 Symposium on Performance Evaluation of Computer Telecommunication Systems, July 2004.Google ScholarGoogle Scholar
  5. R. Dennard, F. Gaensslen, H.-N. Yu, V. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, pages 256--268, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Ekman, F. Dahlgren, and P. Stenstrom. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design, pages 243--246, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Gochman, A. Mendelson, A. Naveh, and E. Rotem. Introduction to Intel Core Duo processor architecture. Intel Technology Journal, May 2006.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits, 31(9):1277--1284, September 1996.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A novel SIMD architecture for the CELL heterogeneous chip-multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. IBM. IBM PowerPC 440 product brief. http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_440_Embedded_Core, July 2006.Google ScholarGoogle Scholar
  11. J. P. Singh, W-D. Weber, and A. Gupta. Splash: Stanford parallel applications for shared memory. Computer Architecture News, pages 5--44, March 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The AMD opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March/April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Moshovos. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 234--245, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Moshovos, G. Memik, B. Falsafi, and A. N. Choudhary. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In HPCA-7, pages 85--96, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A.-T. Nguyen, M. Michael, A. Sharma, and J. Torrellas. The augmint multiprocessor simulation toolkit for intel x86 architectures. In Proceedings of 1996 International Conference on Computer Design, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Salapura et al. Power and performance optimization at the system level. In Proceedings of Computing Frontiers 2005, Ischia, Italy, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Saldanha and M. Lipasti. Power efficient cache coherence. In Proceedings of the Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001.Google ScholarGoogle Scholar
  18. V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. Strenski, and P. Emma. Optimizing pipelines for power and performance. In ACM/IEEE, editor, Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 333--344, Istanbul, Turkey, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
    September 2007
    113 pages
    ISBN:9781595938077
    DOI:10.1145/1327171

    Copyright © 2007 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 September 2007

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate6of9submissions,67%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader