research-article

NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies

Authors:
Li Zhao

Intel, Hillsboro, USA

Intel, Hillsboro, USA
View Profile

,
Ravi Iyer

Intel, Hillsboro, USA

Intel, Hillsboro, USA
View Profile

,
Srihari Makineni

Intel, Hillsboro, OR, USA

Intel, Hillsboro, OR, USA
View Profile

,
Don Newell

Intel, Hillsboro, USA

Intel, Hillsboro, USA
View Profile

,
Liqun Cheng

Intel, Hillsboro, USA

Intel, Hillsboro, USA
View Profile

CF '10: Proceedings of the 7th ACM international conference on Computing frontiersMay 2010Pages 121–130https://doi.org/10.1145/1787275.1787314

Published:17 May 2010Publication History

CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Pages 121–130

ABSTRACT

Chip-multiprocessor (CMP) architectures employ multi-level cache hierarchies with private L2 caches per core and a shared L3 cache like Intel's Nehalem processor and AMD's Barcelona processor. When designing a multi-level cache hierarchy, one of the key design choices is the inclusion policy: inclusive, non-inclusive or exclusive. Either choice has its benefits and drawbacks. An inclusive cache hierarchy (like Nehalem's L3) has the benefit of allowing incoming snoops to be filtered at the L3 cache, but suffers from (a) reduced space efficiency due to replication between the L2 and L3 caches and (b) reduced flexibility since it cannot bypass the L3 cache for transient or low priority data. In an inclusive L2/L3 cache hierarchy, it also becomes difficult to flexibly chop L3 cache size (or increase L2 cache size) for different product instantiations because the inclusion can start to affect performance (due to significant back-invalidates). In this paper, we present a novel approach to addressing the drawbacks of inclusive caches, while retaining its positive features of snoop filtering. We present NCID: a non-inclusive cache, inclusive directory architecture that allows data in the L3 to be non-inclusive or exclusive, but retains tag inclusion in the directory to support complete snoop filtering. We then describe and evaluate a range of NCID-based architecture options and policies. Our evaluation shows that NCID enables a flexible and efficient cache hierarchy for future CMP platforms and has the potential to improve performance significantly for several important server benchmarks.

References

J. L. Baer and W .H. Wang. "On the inclusion properties for multi-level cache hierarchies," Proceedings of the 15th Annual International Symposium on Computer Architecture, page 73--80, 1988. Google ScholarDigital Library
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 282--293, June 2000. Google ScholarDigital Library
Bradford M. Beckmann, Michael R. Marty, and David A. Wood, "ASR: Adaptive Selective Replication for CMP Caches," 39th International Symposium on Microarchitecture (MICRO), December 2006. Google ScholarDigital Library
Bradford M. Beckmann and David A. Wood, "Managing Wire Delay in Large Chip-Multiprocessor Caches," 37th International Symposium on Microarchitecture (MICRO), December 2004. Google ScholarDigital Library
L. Cheng, N. Muralimanohar, K. Ramani, R. Balasubramonian, and J. Carter. Interconnect-Aware Coherence Protocols for Chip Multiprocessors. In Proceedings of ISCA-33, June 2006. Google ScholarDigital Library
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. "Optimizing replication, communication and capacity allocation in CMPs," In the 32nd ISCA, pages 357--368, June 2005. Google ScholarDigital Library
S. Ghai, J. Joyner, and L. John. Investigating the Effectiveness of a Third Level Cache. Technical Report TR-980501-01, Laboratory for Computer Architecture, The University of Texas at Austin, May 1998.Google Scholar
Intel® Microarchitecture (Nehalem), http://www.intel.com/technology/architecture-silicon/next-gen/Google Scholar
R. Iyer, "CQoS: A Framework for Enabling QoS in Shared Caches of CMP Platforms," 18th Annual International Conference on Supercomputing (ICS'04), July 2004. Google ScholarDigital Library
R. Iyer, L. Zhao, et al., "QoS Policies and Architecture for Cache/Memory in CMP Platforms", the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), June 2007 Google ScholarDigital Library
N. P. Jouppi and Steven J. E. Wilton, "Tradeoffs in Two-Level On-chip Caching," Proceedings of the 21st annual international symposium on Computer architecture 1994 , Chicago, Illinois, United States Google ScholarDigital Library
C. Kim, D.C. Burger, and S.W. Keckler, "NUCA: A Non-Uniform Cache Access Architecture for Wire-Delay Dominated On-Chip Caches," IEEE Micro Special Issue (Top Picks in Computer Architecture), Nov/Dec 2003. Google ScholarDigital Library
M. M. K. Martin, M. D. Hill, and D. A. Wood. "Token coherence: Decoupling performance and correctness," In the 30th ISCA, pages 182--193, June 2003. Google ScholarDigital Library
Michael R. Marty and Mark D. Hill, "Virtual Hierarchies," IEEE Micro Special Issue: Micro's Top Picks from Microarchitecture Conferences, January-February 2008. Google ScholarDigital Library
N. Muralimanohar, R. Balasubramonian, "Interconnect Design Considerations in Large NUCA caches," Proceedings of the 34th annual international symposium on Computer architecture 2007, San Diego, California, USA. Google ScholarDigital Library
S. Przybylski, M. Horowitz, and J. Hennessy, "Characteristics of performance-optimal multi-level cache hierarchies," In Proceedings of the 16th Annual international Symposium on Computer Architecture (Jerusalem, Israel). ISCA '89. ACM, New York, NY, 114--121. DOI= http://doi.acm.org/10.1145/74925.74939 Google ScholarDigital Library
M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely Jr., and J. Emer. "Adaptive Insertion Policies for High-Performance Caching", in the International Symposium on Computer Architecture (ISCA), 2007 Google ScholarDigital Library
Sap America Inc., "SAP Standard Benchmarks," http://www.sap.com/solutions/benchmark/index.epxGoogle Scholar
A. J. Smith, "Cache Memories," ACM Computing Surveys, Vol.14, No.3, September 1982. Google ScholarDigital Library
SPECjbb2005, http://www.spec.org/jbb2005/Google Scholar
E. Speight, H. Shafi, L. Zhang, and R. Rajamony, "Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors," Proceedings of the 32nd annual international symposium on Computer Architecture 2005. Google ScholarDigital Library
The TPC-C Benchmark, http://www.tpc.org/tpcc/Google Scholar
The TPC-E Benchmark, http://www.tpc.org/tpce/Google Scholar
B. Waldecker, "AMD Quad Core Processor Overview", http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182,00.htmlGoogle Scholar
M. Zhang and K. Asanovic. "Victim replication: Maximizing capacity while hiding wire delay in tiled CMPs", In the 32nd ISCA, pages 336--345, June 2005. Google ScholarDigital Library
L. Zhao, R. Iyer, J. Moses, R. Illikkal, S. Makineni and D. Newell, "Exploring Large-scale CMP Architectures using ManySim", IEEE Micro, July/August 2000 Google ScholarDigital Library

Index Terms

NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

The Tag Filter Architecture: An energy-efficient cache and directory design
Abstract
Power consumption in current high-performance chip multiprocessors (CMPs) has become a major design concern that aggravates with the current trend of increasing the core count. A significant fraction of the total power budget is ...
Highlights
- Homogeneous distribution of the less significant bits of the tag across ways of cache sets.
Read More
TurboTag: lookup filtering to reduce coherence directory power
ISLPED '10: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design

On-chip coherence directories of today's multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary lookups when running commercial server and scientific workloads. These workloads ...
Read More
Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks

In this paper, we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than a single core. Thus, we do not maintain coherence for private ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers
May 2010
370 pages
ISBN:9781450300445
DOI:10.1145/1787275
General Chair:
Nancy M. Amato
Texas A&M University, USA
,
Program Chairs:
Hubertus Franke
IBM Research, USA
,
Paul H.J. Kelly
Imperial College London, UK
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 May 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache
directory
Qualifiers
- research-article
Conference

Acceptance Rates
CF '10 Paper Acceptance Rate30of113submissions,27%Overall Acceptance Rate240of680submissions,35%
More
Upcoming Conference
CF '24

Sponsor:

sigmicro

21st ACM International Conference on Computing Frontiers

May 7 - 9, 2024

Ischia , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 1,105
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies

CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

The Tag Filter Architecture: An energy-efficient cache and directory design

TurboTag: lookup filtering to reduce coherence directory power

Dynamic directory table with victim cache: on-demand allocation of directory entries for active shared cache blocks