skip to main content
10.1145/1944862.1944889acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Published:24 January 2011Publication History

ABSTRACT

This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large-scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets' usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. Simulation results using a full-system simulator demonstrate that CE achieves an average L2 miss rate reduction of 13.6% over a shared NUCA scheme and by as much as 46.7% for the benchmark programs we examined. Furthermore, evaluations showed that CE outperforms related cache designs.

References

  1. M. Awasthi, K. Sudan, R. Balasubramonian, J. Carter. "Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches," HPCA, Feb. 2009.Google ScholarGoogle Scholar
  2. B. M. Beckmann, M. R. Marty, and D. A. Wood. "ASR: Adaptive Selective Replication for CMP Caches," MICRO, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. M. Beckmann and D. A. Wood. "Managing Wire Delay in Large Chip-Multiprocessor Caches," MICRO, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. M. Bienia, S. Kumar, J. P. Singh, and K. Li. "The PARSEC Benchmark Suite: Characterization and Architectural Implications," PACT, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chang and G. S. Sohi. "Cooperative Caching for Chip Multiprocessors," ISCA, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Chaudhuri. "PageNUCA: Selected Policies for Page-grain Locality Management in Large Shared Chip-multiprocessor Caches," HPCA, Feb. 2009.Google ScholarGoogle Scholar
  7. Z. Chishti, M. D. Powell, and T. N. Vijaykumar. "Optimizing Replication, Communication, and Capacity Allocation in CMPs," ISCA, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cho and L. Jin "Managing Distributed Shared L2 Caches through OS-Level Page Allocation," MICRO, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Z. Guz, I. Keidar, A. Kolodny, U. C. Weiser. "Utilizing Shared Data in Chip Multiprocessors with the Nahalal Architecture," SPAA, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hammoud, S. Cho, and R. Melhem. "A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors," Computer Architecture Letters, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Hammoud, S. Cho, and R. Melhem. "ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors," HiPEAC, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. "Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches," ISCA, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. HP Labs. "http://www.hpl.hp.com/research/cacti/"Google ScholarGoogle Scholar
  14. J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. "A NUCA Substrate for Flexible CMP Cache Sharing," ICS, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Jin and S. Cho. "Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches," ICPP, September 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. P. Jouppi. "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kandemir, F. Li, M. J. Irwin, and S. W. Son. "A Novel Migration-Based NUCA Design for Chip Multiprocessors," Proc. HiPC, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Kim, D. Burger, and S. W. Keckler. "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," ASPLOS, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Kongetira, K. Aingaran, and K. Olukotun. "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, March--April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Memik, G. Reinman, and W. H. Mangione-Smith. "Reducing Energy and Delay Using Efficient Victim Caches," ISLPED, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Olukotun, L. Hammond, and J. Laudon. "Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency," Synthesis Lectures on Computer Arch, 1st Ed., Morgan and Claypool, Dec. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. K. Qureshi. "Adaptive Spill-Receive for Robust High-Performance Caching in CMPs," HPCA, Feb. 2009.Google ScholarGoogle Scholar
  23. Research at Intel. "Introducing the 45nm Next-Generation Intel Core#8482; Microarchitecture," White Paper.Google ScholarGoogle Scholar
  24. A. Ros, M. E. Acacio, and J. M. García "Scalable Directory Organization for Tiled CMP Architectures," ICCAD, July 2008.Google ScholarGoogle Scholar
  25. T. Sherwood, B. Calder, and J. Emer. "Reducing CacheMisses Using Hardware and Software Page Placement," ICS, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Sinharoy, R. N. Kalla, J. M. Tendler, R. J. Eickemeyer, and J. B. Joyner. "POWER5 System Microarchitecture," IBM J. Res. & Dev., July. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Srikantaiah, M. Kandemir, and M. J. Irwin. "Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors," ASPLOS, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," HPCA, Feb. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Standard Performance Evaluation Corporation. http://www.specbench.org.Google ScholarGoogle Scholar
  30. D. Tam, R. Azimi, L. Soares, and M. Stumm. "Managing Shared L2 Caches on Multicore Systems in Software," WIOSCA, 2007.Google ScholarGoogle Scholar
  31. N. Topham, A. Gonzalez, and J. Gonzalez. "The Design and Performance of a Conflict-Avoiding Cache," MICRO, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Vandierendonck, P. Manet, and J.-D. Legat. "Application-Specific Reconfigurable XOR-Indexing To Eliminate Cache Conflict Misses," DATE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Virtutech AB. Simics Full System Simulator "http://www.simics.com/"Google ScholarGoogle Scholar
  34. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. "The SPLASH-2 Programs: Characterization and Methodological Considerations," ISCA, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhang. "Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches," ISCA, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Zhang and K. Asanović. "Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors," ISCA, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
      January 2011
      226 pages
      ISBN:9781450302418
      DOI:10.1145/1944862

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 January 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader