ABSTRACT
The gap between processing and storage speeds remains a concern for computer system designers and application developers. This disparity can be bridged in part by eliminating unnecessary stores, thereby reducing the amount of traffic that flows from the processor and first-level caches to the slower components of the storage subsystem. Reducing the "write" traffic can improve program performance, save power, and increase the longevity of storage components that have limited write endurance. Techniques have been proposed and evaluated for identifying various classes of stores that can be silenced. A relatively unexplored class of such stores are those that would write data that is dirty, but dead. Such data appears as if it needs to be written back to memory from cache, yet it can be proven that the application can never subsequently access the data.
In this paper, we suggest identifying garbage (trash) in cache, so that the dirty bytes associated with the trash need not be written to memory. We propose and evaluate a simple technique based on reference counting that finds a subset of these "eternally silent" (dead) stores. When applied to popular benchmarks, our results show that a significant fraction of the writes to memory can be silenced based on the impossibility of an application subsequently accessing the data.
- A. W. Appel. Simple generational garbage collection and fast allocation. Softw. Pract. Exper., 19(2):171--183, Feb. 1989. ISSN 0038-0644.. URL http://dx.doi.org/10.1002/spe.4380190206. Google ScholarDigital Library
- S. Bhattacharya, K. Gopinath, and M. G. Nanda. Combining concern input with program analysis for bloat detection. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '13, pages 745--764, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2374-1.. URL http://doi.acm.org/10.1145/2509136.2509522. Google ScholarDigital Library
- Blackburn, S. M. et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-Oriented Programing, Systems, Languages, and Applications, 2006. Google ScholarDigital Library
- S. Bock, B. Childers, R. Melhem, D. Mosse, and Y. Zhang. Analyzing the impact of useless write-backs on the endurance and energy consumption of pcm main memory. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '11, pages 56--65, 2011. ISBN 978-1-61284-367-4. Google ScholarDigital Library
- D. J. Cannarozzi, M. P. Plezbert, and R. K. Cytron. Contaminated garbage collection. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, PLDI '00, pages 264--273, 2000. ISBN 1-58113-199-2. Google ScholarDigital Library
- A. E. Chis, N. Mitchell, E. Schonberg, G. Sevitsky, P. O'Sullivan, T. Parsons, and J. Murphy. Patterns of memory inefficiency. In ECOOP, pages 383--407, 2011. Google ScholarDigital Library
- S. Friedman, P. Krishnamurthy, R. Chamberlain, R. K. Cytron, and J. E. Fritts. Dusty caches for reference counting garbage collection. In Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications, systems and architecture, MEDEA '05, pages 3--10, 2005. Google ScholarDigital Library
- C. Isen and L. John. Eskimo: Energy savings using semantic knowledge of inconsequential memory occupancy for dram subsystem. In Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 337--346, 2009. Google ScholarDigital Library
- R. E. Jones and C. Ryder. A study of java object demographics. In Proceedings of the 7th international symposium on Memory management, ISMM'08, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7.. URL http://doi.acm.org/10.1145/1375634.1375652. Google ScholarDigital Library
- K. M. Lepak and M. H. Lipasti. Silent stores for free. In Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, MICRO 33, pages 22--31, New York, NY, USA, 2000. ACM. ISBN 1-58113-196-8.. URL http://doi.acm.org/10.1145/360128.360133. Google ScholarDigital Library
- K. M. Lepak and M. H. Lipasti. On the value locality of store instructions. In Proceedings of the 27th annual international symposium on Computer architecture, ISCA '00, pages 182--191, New York, NY, USA, 2000. ACM. ISBN 1-58113-232-8.. URL http://doi.acm.org/10.1145/339647.339678. Google ScholarDigital Library
- K. M. Lepak and M. H. Lipasti. Temporally silent stores. SIGPLAN Not., 37:30--41, October 2002. ISSN 0362-1340.. URL http://doi.acm.org/10.1145/605432.605401. Google ScholarDigital Library
- A. Naz, K. Kavi, W. Li, and P. Sweany. Tiny split data-caches make big performance impact for embedded applications. J. Embedded Comput., 2(2): 207--219, Apr. 2006. ISSN 1740-4460. URL http://dl.acm.org/citation.cfm?id=1370998.1371002. Google ScholarDigital Library
- P. R. Wilson. Uniprocessor garbage collection techniques (Long Version), 1994. URL ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps.Google Scholar
Index Terms
Trash in cache: detecting eternally silent stores
Recommendations
Recycling trash in cache
ISMM '15The disparity between processing and storage speeds can be bridged in part by reducing the traffic into and out of the slower memory components. Some recent studies reduce such traffic by determining dead data in cache, showing that a significant ...
ETD-Cache: an expiration-time driven cache scheme to make SSD-based read cache endurable and cost-efficient
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersRecently flash-based solid-state drives (SSDs) have been widely deployed as cache devices to boost system performance. However, classical SSD cache algorithms (e.g. LRU) replace the cached data frequently to maintain high hit rates. Such aggressive data ...
Recycling trash in cache
ISMM '15: Proceedings of the 2015 International Symposium on Memory ManagementThe disparity between processing and storage speeds can be bridged in part by reducing the traffic into and out of the slower memory components. Some recent studies reduce such traffic by determining dead data in cache, showing that a significant ...
Comments