research-article

Improving the accuracy of snoop filtering using stream registers

Authors:
Valentina Salapura

IBM Thomas J. Watson Research Center, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Matthias Blumrich

IBM Thomas J. Watson Research Center, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

,
Alan Gara

IBM Thomas J. Watson Research Center, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, Yorktown Heights, NY
View Profile

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architectureSeptember 2007Pages 25–32https://doi.org/10.1145/1327171.1327174

Published:16 September 2007Publication History

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

Pages 25–32

ABSTRACT

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings.

This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.

References

F. Aono and M. Kimura. The Azusa 16-way Itanium server. IEEE Micro, 20(5):54--60, September/October 2000. Google ScholarDigital Library
F. Briggs, S. Chittor, and K. Cheng. Micro-architecture techniques in the intel e8870 scalable memory controller. In Proceedings of the 3rd Workshop on Memory Performance Issues, in conjunction with ISCA-31, pages 30--36, June 2004. Google ScholarDigital Library
A. Bright, M. Ellavsky, A. Gara, R. Haring, G. Kopcsay, R. Lembach, J. Marcella, M. Ohmacht, and V. Salapura. Creating the BlueGene/L supercomputer from low power SoC ASICs. In Internationcal Solid State Circuits Conference. IEEE, February 2005.Google ScholarCross Ref
S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In Proceedings of the 2004 Symposium on Performance Evaluation of Computer Telecommunication Systems, July 2004.Google Scholar
R. Dennard, F. Gaensslen, H.-N. Yu, V. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, pages 256--268, 1974.Google ScholarCross Ref
S. Ekman, F. Dahlgren, and P. Stenstrom. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design, pages 243--246, August 2002. Google ScholarDigital Library
S. Gochman, A. Mendelson, A. Naveh, and E. Rotem. Introduction to Intel Core Duo processor architecture. Intel Technology Journal, May 2006.Google ScholarCross Ref
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits, 31(9):1277--1284, September 1996.Google ScholarCross Ref
M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A novel SIMD architecture for the CELL heterogeneous chip-multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.Google ScholarCross Ref
IBM. IBM PowerPC 440 product brief. http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_440_Embedded_Core, July 2006.Google Scholar
J. P. Singh, W-D. Weber, and A. Gupta. Splash: Stanford parallel applications for shared memory. Computer Architecture News, pages 5--44, March 1992. Google ScholarDigital Library
C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The AMD opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March/April 2003. Google ScholarDigital Library
A. Moshovos. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 234--245, June 2005. Google ScholarDigital Library
A. Moshovos, G. Memik, B. Falsafi, and A. N. Choudhary. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In HPCA-7, pages 85--96, 2001. Google ScholarDigital Library
A.-T. Nguyen, M. Michael, A. Sharma, and J. Torrellas. The augmint multiprocessor simulation toolkit for intel x86 architectures. In Proceedings of 1996 International Conference on Computer Design, October 1996. Google ScholarDigital Library
V. Salapura et al. Power and performance optimization at the system level. In Proceedings of Computing Frontiers 2005, Ischia, Italy, May 2005. Google ScholarDigital Library
C. Saldanha and M. Lipasti. Power efficient cache coherence. In Proceedings of the Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001.Google Scholar
V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. Strenski, and P. Emma. Optimizing pipelines for power and performance. In ACM/IEEE, editor, Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 333--344, Istanbul, Turkey, November 2002. Google ScholarDigital Library
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, June 1995. Google ScholarDigital Library

Recommendations

Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
ISLPED '08: Proceedings of the 2008 international symposium on Low Power Electronics & Design

We present a snoop filtering mechanism for multicore microprocessors that implement coherent caches using the MESI protocol. The relatively small filter structure at each core maintains coarse-grain sharing information about regions within a page to ...
Read More
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip symmetric multiprocessors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The ...
Read More
Exploring the Architecture of a Stream Register-Based Snoop Filter
Proceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 6590

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip symmetric multiprocessors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
September 2007
113 pages
ISBN:9781595938077
DOI:10.1145/1327171
Conference Chairs:
Pierfrancesco Foglia
University of Pisa
,
Cosimo Antonio Prete
University of Pisa
,
Sandro Bartolini
University of Siena
,
Roberto Giorgi
University of Siena
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6of9submissions,67%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 388
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving the accuracy of snoop filtering using stream registers

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

ABSTRACT

References

Cited By

Recommendations

Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors

Exploring the architecture of a stream register-based snoop filter

Exploring the Architecture of a Stream Register-Based Snoop Filter