skip to main content
10.1145/1378533.1378536acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

A consistency architecture for hierarchical shared caches

Published: 14 June 2008 Publication History

Abstract

Hierarchical Cache Consistency (HCC) is a scalable cache-consistency architecture for chip multiprocessors in which caches are shared hierarchically. HCC's cache-consistency protocol is embedded in the message-routing network that interconnects the caches, providing a distributed and scalable alternative to bus-based and directory-based consistency mechanisms. The HCC consistency protocol is "progressive" in that every message makes monotonic progress without timeouts, retries, negative acknowledgments, or retreating in any way. The latency is at most proportional to the diameter of the network. For HCC with a binary fat-tree network, the protocol requires at most 13 bits of additional state per cache line, no matter how large the system. We prove that the HCC protocol is deadlock free and provides sequential consistency.

References

[1]
M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A two-level directory architecture for highly scalable cc-NUMA multiprocessors. Trans. on Parallel and Distributed Systems, 16(1):67--79, 2005.
[2]
M. E. Acacio. An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration. IEEE Trans. on Parallel Distributed Systems, 15(8):755--768, 2004.
[3]
M. E. Acacio, J. Gonzalez, J. M. Garcia, and J. Duato. A new scalable directory architecture for large-scale multiprocessors. In IEEE HPCA, 2001.
[4]
Advanced Micro Devices. AMD64 Architecture Programmer's Manual, Volume 2: System Programming, July 2007.
[5]
A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz. An evaluation of directory schemes for cache coherence. In ISCA, 2008.
[6]
Arvind. Bluespec: A language for hardware design, simulation, synthesis and verification. In MEMOCODE, 2003.
[7]
J.-L. Baer and W.-H. Wang. On the inclusion properties for multi-level cache hierarchies. In ISCA, 1988.
[8]
E. G. Bolotin, Z. Cidon, I. Ginosar, and A. R. Kolodny. The power of priority: NoC based distributed cache coherency. In NOCS, 2007.
[9]
J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In ISCA, 2006.
[10]
Y. Chang and L. Bhuyan. An efficient tree cache coherence protocol for distributed shared memory multiprocessors. IEEE Trans. on Computers, 48(3):352--360, 1999.
[11]
L. Cheng, N. Muralimanohar, K. Ramani, R. Balasubramonian, and J. B. Carter. Interconnect-aware coherence protocols for chip multiprocessors. In ISCA, 2006.
[12]
D. R. Cheriton, H. A. Goosen, and P. D. Boyle. Multi-level shared caching techniques for scalability in VMP-M/C.In ISCA, pages 16--24, 1989.
[13]
W. J. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In DAC, 2001.
[14]
N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In MICRO, 2006.
[15]
M. Frigo. The weakest reasonable memory model. Master's thesis, MIT EECS, 1998.
[16]
J. R. Goodman. Using cache memory to reduce processor-memory traffic. In ISCA, 1983.
[17]
M. Hill. Multiprocessors should support simple memory consistency models. IEEE Computer, 31(8):28--34, 1998.
[18]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: System Programming Guide, Part 1, October 2006.
[19]
S. Kaxiras and J. R. Goodman. The GLOW cache coherence protocol extensions for widely shared data. In ICS, 1996.
[20]
C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In ASPLOS-X, 2002.
[21]
R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling. In ISCA, 2005.
[22]
C.-Y. Lam and S. E. Madnick. Propeties of storage hierarchy systems with multiple page sizes and redundant data. ACM Transaction on Database Systems, 4(3):345--367, 1979.
[23]
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess progranm. IEEE Trans. on Computers, C-28(9):690--691, 1979.
[24]
F. T. Leighton and B. M. Maggs. Fast algorithms for routing around faults in multibutterflies and randomly-wired splitter networks. IEEE Transaction on Computers, 41(5):578--587, 1992.
[25]
C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, W. D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, S.-W. Yang, and R. Zak. The network architecture of the Connection Machine CM-5. In SPAA, 1992.
[26]
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy. The DASH prototype: implementation and performance. In ISCA, 1998.
[27]
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: decoupling performance and correctness. In ISCA, 2003.
[28]
M. R. Marty and M. D. Hill. Coherence ordering for ring-based chip multiprocessors. In MICRO, 2006.
[29]
M. R. Marty and M. D. Hill. Virtual hierarchies to support server consolidation. In ISCA, 2007.
[30]
H. E. Mizrahi, J. L. Baer, E. D. Lazowska, and J. Zahorjan. Introducing memory into the switch elements of multiprocessor interconnection networks. In ISCA, 1989.
[31]
S. Mori, H. Saito, M. Goshima, S. Tomita, M. Yanagihara, T. Tanaka, D. Fraser, K. Joe, and H. Nitta. A distributed shared memory multiprocessor ASURA: memory and cache architecture. In Supercomputing, 1993.
[32]
H. Nilsson and P. Stenstrom. The scalable tree protocol-a cache coherence approach for large-scale multiprocessors. In IPDPS, 1992.
[33]
D. A. Patterson and J. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufmann, second edition, 1998.
[34]
S. Przybylski, M. Horowitz, and J. Hennessy. Characteristics of performance-optimal multi-level cache hierarchies. In ISCA, 1989.
[35]
A. Ros, M. E. Acacio, and J. M. Garcia. An efficient cache design for scalable glueless shared-memory multiprocessors. In Computing Frontiers. ACM Press, 2006.
[36]
X. Shen and Arvind. Specification of memory models and design of provably correct cache coherence protocols. Technical Report 398, MIT Computation Structures Group, 1997.
[37]
X. Shen, Arvind, and L. Rudolph. CACHET: an adaptive cache coherence protocol for distributed shared-memory systems. In ICS, 1999.
[38]
E. Speight, H. Shafi, L. Zhang, and R. Rajamony. Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors. In ISCA, 2005.
[39]
K. Strauss, X. Shen, and J. Torrellas. Flexible snooping: adaptive forwarding and filtering of snoops in embedded-ring multiprocessors. ISCA, 2006.
[40]
A. W. Wilson, Jr.Hierarchical cache/bus architecture for shared memory multiprocessors. In ISCA, 1987.
[41]
Q. Yang, G. Thangadurai, and L. M. Bhuyan. Design of an adaptive cache coherence protocol for large scale multiprocessors. IEEE Trans. on Parallel and Distributed Systems, 3(3):281--293, 1992.

Cited By

View all
  • (2020)HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00077(888-899)Online publication date: May-2020
  • (2019)CONCORD: Improving Communication using Consumer-Count Detection2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA47632.2019.9035220(1-11)Online publication date: Nov-2019
  • (2017)Architecting hierarchical coherence protocols for push-button parametric verificationProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123971(477-489)Online publication date: 14-Oct-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '08: Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
June 2008
380 pages
ISBN:9781595939739
DOI:10.1145/1378533
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache consistency
  2. deadlock
  3. fat-tree
  4. mapping collision
  5. memory hierarchy
  6. message race
  7. progressive protocol
  8. sequential consistency
  9. shared caches

Qualifiers

  • Research-article

Conference

SPAA08

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00077(888-899)Online publication date: May-2020
  • (2019)CONCORD: Improving Communication using Consumer-Count Detection2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA47632.2019.9035220(1-11)Online publication date: Nov-2019
  • (2017)Architecting hierarchical coherence protocols for push-button parametric verificationProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123971(477-489)Online publication date: 14-Oct-2017
  • (2015)Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2015.7056032(186-197)Online publication date: Feb-2015
  • (2014)PVCoherence: Designing flat coherence protocols for scalable verification2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2014.6835949(392-403)Online publication date: Feb-2014
  • (2013)Using in-flight chains to build a scalable cache coherence protocolACM Transactions on Architecture and Code Optimization10.1145/2541228.254123510:4(1-24)Online publication date: 1-Dec-2013
  • (2013)High-speed formal verification of heterogeneous coherence hierarchiesProceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2013.6522350(566-577)Online publication date: 23-Feb-2013
  • (2011)Manager-client pairingProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155647(226-236)Online publication date: 3-Dec-2011
  • (2010)Low depth cache-oblivious algorithmsProceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures10.1145/1810479.1810519(189-199)Online publication date: 13-Jun-2010
  • (2009)NBC: Network-based cache coherence protocol for multistage NoCs2009 International SoC Design Conference (ISOCC)10.1109/SOCDC.2009.5423850(337-340)Online publication date: Nov-2009

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media