Article

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

Authors:
Karin Strauss

University of Illinois, Urbana-Champaign

University of Illinois, Urbana-Champaign
View Profile

,
Xiaowei Shen

IBM T. J. Watson Research Center

IBM T. J. Watson Research Center
View Profile

,
Josep Torrellas

University of Illinois, Urbana-Champaign

University of Illinois, Urbana-Champaign
View Profile

ISCA '06: Proceedings of the 33rd annual international symposium on Computer ArchitectureJune 2006Pages 327–338https://doi.org/10.1109/ISCA.2006.21

Published:01 May 2006Publication History

ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture

Pages 327–338

ABSTRACT

A simple and low-cost approach to supporting snoopy cache coherence is to logically embed a unidirectional ring in the network of a multiprocessor, and use it to transfer snoop messages. Other messages can use any link in the network. While this scheme works for any network topology, a naive implementation may result in long response times or in many snoop messages and snoop operations. To address this problem, this paper proposes Flexible Snooping algorithms, a family of adaptive forwarding and filtering snooping algorithms. In these algorithms, a node receiving a snoop request may either forward it to another node and then perform the snoop, or snoop and then forward it, or simply forward it without snooping. The resulting design space offers trade-offs in number of snoop operations and messages, response time, and energy consumption. Our analysis using SPLASH-2, SPECjbb, and SPECweb workloads finds several snooping algorithms that are more costeffective than current ones. Specifically, our choice for a highperformance snooping algorithm is faster than the currently fastest algorithm while consuming 9-17% less energy; our choice for an energy-efficient algorithm is only 3-6% slower than the previous one while consuming 36-42% less energy.

References

{1} M. E. Acacio, J. González, J. M. García, and J. Duato. Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture. In High Performance Computing, Networks and Storage Conference (SC), Nov 2002. Google ScholarDigital Library
{2} L. Barroso and M. Dubois. The Performance of Cache-Coherent Ring-based Multiprocessors. In International Symposium on Computer Architecture, May 1993. Google ScholarDigital Library
{3} B. Bloom. Space/time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 11(7):422-426, July 1970. Google ScholarDigital Library
{4} J. F. Cantin, M. H. Lipasti, and J. E. Smith. Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking. In International Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
{5} D. E. Culler and J. P. Singh. Parallel Computer Architecture; A Hard-ware/Software Approach. Morgan Kaufmann, 1999. Google ScholarDigital Library
{6} M. Ekman, F. Dahlgren, and P. Stenström. Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors. In Workshop on Duplicating, Deconstructing, and Debunking, May 2002.Google Scholar
{7} HyperTransport Technology Consortium. HyperTransport I/O Link Specification , 2.00b edition, April 2005.Google Scholar
{8} R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In International Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
{9} M. Martin, P. Harper, D. Sorin, M. Hill, and D. Wood. Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. In International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
{10} M. Martin, M. Hill, and D. Wood. Token Coherence: Decoupling Performance and Correctness. In International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
{11} M. Marty, J. Bingham, M. Hill, A. Hu, M. Martin, and D. Wood. Improving Multiple-CMP Systems Using Token Coherence. In International Symposium on High-Performance Computer Architecture, Feb 2005. Google ScholarDigital Library
{12} Micron Technology, Inc. System-Power Calculator. http://www.micron.com/products/dram/syscalc.html.Google Scholar
{13} A. Moshovos. RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence. In International Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
{14} A. Moshovos, G. Memik, B. Falsafi, and A. Choudhary. JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers. In International Symposium on High-Perfomance Computer Architecture, Jan 2001. Google ScholarDigital Library
{15} J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, K. Strauss, S. Sarangi, P. Sack, and P. Montesinos. SESC Simulator, Jan 2005. http://sesc.sourceforge.net.Google Scholar
{16} C. Saldanha and M. Lipasti. Power Efficient Cache Coherence. In Workshop on Memory Performance Issues, June 2001.Google Scholar
{17} X. Shen. A Snoop-and-Forward Cache Coherence Protocol for SMP Systems with Ring-based Address Networks. Technical report, IBM T. J. Watson Research Center, June 2004.Google Scholar
{18} P. Shivakumar and N. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power and Area Model. Technical Report 2001/2, Compaq Computer Corporation, Aug 2001.Google Scholar
{19} Silicon Graphics. Silicon Graphics Altrix 3000 Scalable 64-bit Linux Platform. http://www.sgi.com/products/servers/altix/.Google Scholar
{20} Standard Performace Evaluation Corporation (SPEC). http://www.spec.org.Google Scholar
{21} Sun Microsystems. Sun Enterprise 10000 Server Overview. http://www.sun.com/servers/highend/e10000/.Google Scholar
{22} J. M. Tendler, J. S. Dodson, J. S. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. In IBM Journal of Research and Development, Jan 2002. Google ScholarDigital Library
{23} Virtutech. Virtutech Simics. http://www.virtutech.com/products.Google Scholar
{24} Z. Vranesic, M. Stumm, D. Lewis, and R. White. Hector: A Hierarchically Structured Shared-Memory Multiprocessor. In IEEE Computer Magazine, Jan 1991. Google ScholarDigital Library
{25} H. S. Wang, X. P. Zhu, L. S. Peh, and S. Malik. Orion:A Power-Performance Simulator for Interconnection Networks. In International Symposium on Microarchitecture , Nov 2002. Google ScholarDigital Library
{26} S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In International Symposium on Computer Architecture, June 1995. Google ScholarDigital Library

Index Terms

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

A simple and low-cost approach to supporting snoopy cache coherence is to logically embed a unidirectional ring in the network of a multiprocessor, and use it to transfer snoop messages. Other messages can use any link in the network. While this scheme ...
Read More
Subspace Snooping: Exploiting Temporal Sharing Stability for Snoop Reduction

Although snoop-based coherence protocols provide fast cache-to-cache transfers with a simple and robust coherence mechanism, scaling the protocols has been difficult due to the overheads of broadcast snooping. In this paper, we propose a coherence ...
Read More
Evaluating the performance of four snooping cache coherency protocols
Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture

Write-invalidate and write-broadcast coherency protocols have been criticized for being unable to achieve good bus performance across all cache configurations. In particular, write-invalidate performance can suffer as block size increases; and large ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
June 2006
383 pages
ISBN:076952608X
ACM SIGARCH Computer Architecture News Volume 34, Issue 2
May 2006
383 pages
ISSN:0163-5964
DOI:10.1145/1150019
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
IEEE Computer Society
United States
Publication History
- Published: 1 May 2006
Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
ISCA '06 Paper Acceptance Rate31of234submissions,13%Overall Acceptance Rate543of3,203submissions,17%
More
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 578
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

Subspace Snooping: Exploiting Temporal Sharing Stability for Snoop Reduction

Evaluating the performance of four snooping cache coherency protocols