skip to main content
10.1145/2370816.2370868acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Published:19 September 2012Publication History

ABSTRACT

Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) cache pollution, i.e., blocks with low reuse evicting blocks with high reuse from the cache, and 2) cache thrashing, i.e., blocks with high reuse evicting each other from the cache.

In this paper, we propose a new, simple mechanism to predict the reuse behavior of missed cache blocks in a manner that mitigates both pollution and thrashing. Our mechanism tracks the addresses of recently evicted blocks in a structure called the Evicted-Address Filter (EAF). Missed blocks whose addresses are present in the EAF are predicted to have high reuse and all other blocks are predicted to have low reuse. The key observation behind this prediction scheme is that if a block with high reuse is prematurely evicted from the cache, it will be accessed soon after eviction. We show that an EAF-implementation using a Bloom filter, which is cleared periodically, naturally mitigates the thrashing problem by ensuring that only a portion of a thrashing working set is retained in the cache, while incurring low storage cost and implementation complexity.

We compare our EAF-based mechanism to five state-of-the-art mechanisms that address cache pollution or thrashing, and show that it provides significant performance improvements for a wide variety of workloads and system configurations.

References

  1. AMD Phenom II key architectural features. http://goo.gl/iQBfK.Google ScholarGoogle Scholar
  2. Intel next generation microarchitecture. http://goo.gl/3eskx.Google ScholarGoogle Scholar
  3. Oracle SPARC T4. http://goo.gl/KZSnc.Google ScholarGoogle Scholar
  4. Wind River Simics. www.windriver.com/producs/simics.Google ScholarGoogle Scholar
  5. S. Bansal and D. S. Modha. CAR: Clock with adaptive replacement. In FAST, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. ACM Communications, 13, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Chen, A. Kumar, and J. Xu. A new design of Bloom Vlter for packet inspection speedup. In GLOBECOM, 2007.Google ScholarGoogle Scholar
  9. J. Collins and D. M. Tullsen. Hardware identiVcation of cache conflict misses. In MICRO, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Ding and K. Kennedy. Improving eUective bandwidth through compiler enhancement of global cache reuse. JPDC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Goel and P. Gupta. Small subset queries and Bloom Vlters using ternary associative memories, with applications. In SIGMETRICS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. G. Hallnor and S. K. Reinhardt. A fully associative software managed cache design. In ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Intel. Intel 64 and IA-32 architectures software developer's manuals, 2011.Google ScholarGoogle Scholar
  15. R. Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction. In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Jiang and X. Zhang. LIRS: An eXcient low inter-reference recency set replacement policy to improve buffer cache performance. In SIGMETRICS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. IEEE TC, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Johnson and D. Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Kalla, B. Sinharoy, W. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Kim, D. Han, O. Mutlu, and M. Harchol-Balter. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.Google ScholarGoogle Scholar
  26. Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Luo, J. Gummaraju, and M. Franklin. Balancing thoughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  29. M. J. Lyons and D. Brooks. The design of a Bloom Vlter hardware accelerator for ultra low power systems. In ISLPED, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Megiddo and D. S. Modha. ARC: A self-tuning, low overhead replacement cache. In FAST, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. J. O'Neil, P. E. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. In SIGMOD, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J.-K. Peir, S.-C. Lai, S.-L. Lu, J. Stark, and K. Lai. Bloom filtering cache misses for accurate data speculation and prefetching. In ICS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Piquet, O. Rochecouste, and A. Seznec. Exploiting single-usage for effective memory management. In ACSAC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. K. Qureshi, A. Jaleel, Y. Patt, S. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. K. Qureshi, D. Thompson, and Y. N. Patt. The V-way cache: Demand based associativity via global replacement. In ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Rajan and G. Ramaswamy. Emulating optimal replacement with a shepherd cache. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Rajwar, M. Herlihy, and K. Lai. Virtualizing transactional memory. In ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient hardware hashing functions for high performance computers. IEEE TC, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. Rivera and C.-W. Tseng. Compiler optimizations for eliminating cache conWict misses. Technical Report UMIACS-TR-97-59, University of Maryland, College Park, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Rivers and E. S. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In ICPP, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  44. D. Rolan, B. B. Fraguela, and R. Doallo. Adaptive line placement with the set balancing cache. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Rolan, B. B. Fraguela, and R. Doallo. Reducing capacity and conflict misses using set saturation levels. In HiPC, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  46. D. Sanchez and C. Kozyrakis. The ZCache: Decoupling ways and associativity. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A uniVed mechanism to address both cache pollution and thrashing. Technical Report SAFARI 2012-002, CMU, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Seznec. A case for two-way skewed-associative caches. In ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE TC, Sep. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. S. Tyson, M. K. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In PACT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr., and J. Emer. SHIP: Signature-based hit predictor for high performance caching. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Y. Xie and G. H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. D. Zhan, H. Jiang, and S. C. Seth. STEM: Spatiotemporal management of capacity for intra-core last level caches. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
      September 2012
      512 pages
      ISBN:9781450311823
      DOI:10.1145/2370816

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 September 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate121of471submissions,26%

      Upcoming Conference

      PACT '24
      International Conference on Parallel Architectures and Compilation Techniques
      October 14 - 16, 2024
      Southern California , CA , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader