skip to main content
10.1145/1146909.1147085acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
Article

Configurable cache subsetting for fast cache tuning

Published: 24 July 2006 Publication History

Abstract

Numerous variations of configurable caches, having variable parameters like total size, line size, and associativity, have been proposed in commercial microprocessors in recent years. Tuning a configurable cache to a target application has been shown to reduce memory-access power by over 50%. However, searching the configuration space for the best configuration can require much time or power, even when using recent cache tuning heuristics. We sought to determine, for a particular domain of applications, the smallest subset of cache configurations that would still enable effective tuning. For a suite of 34 benchmarks and a cache with 18 possible configurations, we determine through an exhaustive search of all possible subsets, that only 3 or 4 candidate configurations are necessary to support tuning. We introduce a new heuristic, adapted from an efficient and effective heuristic developed for data mining, to quickly determine the best configurations for any sized subset, with near optimal results. We then consider a configurable cache with 17,640 possible configurations and improve our heuristic to include a pre-pruning step, yielding near optimal tuning results. We conclude that only 3 or 4 possible cache configurations are needed to offer a near optimal configuration for every benchmark in our suite - resulting in a 91% reduction in design space exploration time over a state-of-the-art cache tuning heuristic.

References

[1]
Arc international. In http://www.arccores.com, 2005.
[2]
Arm embedded processor. In http://www.arm.com, 2005.
[3]
Nios embedded processors. In http://www.altera.com, 2005.
[4]
D. H. Albonesi. Selective cache ways: On-demand cache resource allocation. Journal of Instruction-Level Parallelism, 2, May 2000.
[5]
R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, pages 245--257, New York, NY, USA, 2000. ACM Press.
[6]
D. Burger, T. M. Austin, and S. Bennet. Evaluating future microprocessors: the simplescalar tool set. Technical Report CS-TR-1996-1308, Computer Sciences Department, University of Wisconsin, Madison, WI, August 1996.
[7]
EEMBC. The Embedded Microprocessor Benchmark Consortium. In http://www.eembc.org, 2005.
[8]
A. Gordon-Ross, F. Vahid, and N. Dutt. Automatic tuning of two-level caches to embedded applications. In DATE '04: Proceedings of the conference on Design, automation and test in Europe, February 2004.
[9]
A. Gordon-Ross, F. Vahid, and N. Dutt. Fast configurable-cache tuning with a unified second-level cache. In ISLPED '05: Proceedings of the 2005 international symposium on Low power electronics and design, pages 323--326, New York, NY, USA, 2005. ACM Press.
[10]
P. S. Heckbert and M. Garland. Survey of polygonal surface simplification algorithms, multiresolution surface modeling course. In Proceedings of the 24th International Conference on Computer Graphics and Interactive Techiniques, 1997.
[11]
E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An online algorithm for segmenting time series. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 289--296, Washington, DC, USA, 2001. IEEE Computer Society.
[12]
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In International Symposium on Microarchitecture, pages 330--335, 1997.
[13]
A. Malik, B. Moyer, and D. Cermak. A low power unified cache architecture providing power and performance flexibility (poster session). In ISLPED '00: Proceedings of the 2000 international symposium on Low power electronics and design, pages 241--243, New York, NY, USA, 2000. ACM Press.
[14]
G. Reinman and N. Jouppi. Cacti 2.0: An integrated cache timing and power model. Technical report, COMPAQ Western Research Lab, 1999.
[15]
Tensilica. Xtensa Processor Generator. In http://www.tensilica.com, 2005.
[16]
C. Zhang, F. Vahid, and R. Lysecky. A self-tuning cache architecture for embedded systems. In Proc. of the Design, Automation and Test in Europe (DATE'04), February 2004.
[17]
C. Zhang, F. Vahid, and W. Najjar. A highly configurable cache for low energy embedded systems. Trans. on Embedded Computing Sys., 4(2):363--387, 2005.

Cited By

View all
  • (2019)Evaluating Design Space Subsetting for Multi-Objective Optimization in Configurable Systems20th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2019.8697511(104-109)Online publication date: Mar-2019
  • (2018)Realizing Closed-Loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00136(719-725)Online publication date: Jul-2018
  • (2017)Instruction set architecture impact on design space subsetting for configurable systems2017 3rd IEEE International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/CCSSE.2017.8088028(720-723)Online publication date: Aug-2017
  • Show More Cited By

Index Terms

  1. Configurable cache subsetting for fast cache tuning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '06: Proceedings of the 43rd annual Design Automation Conference
    July 2006
    1166 pages
    ISBN:1595933816
    DOI:10.1145/1146909
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cache optimization
    2. configurable cache tuning
    3. low energy

    Qualifiers

    • Article

    Conference

    DAC06
    Sponsor:
    DAC06: The 43rd Annual Design Automation Conference 2006
    July 24 - 28, 2006
    CA, San Francisco, USA

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Evaluating Design Space Subsetting for Multi-Objective Optimization in Configurable Systems20th International Symposium on Quality Electronic Design (ISQED)10.1109/ISQED.2019.8697511(104-109)Online publication date: Mar-2019
    • (2018)Realizing Closed-Loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00136(719-725)Online publication date: Jul-2018
    • (2017)Instruction set architecture impact on design space subsetting for configurable systems2017 3rd IEEE International Conference on Control Science and Systems Engineering (ICCSSE)10.1109/CCSSE.2017.8088028(720-723)Online publication date: Aug-2017
    • (2016)Switchable cache: utilising dark silicon for application specific cache optimisationsIET Computers & Digital Techniques10.1049/iet-cdt.2015.011410:4(157-164)Online publication date: 1-Jul-2016
    • (2014)Minimum Effort Design Space Subsetting for Configurable CachesProceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing10.1109/EUC.2014.19(65-72)Online publication date: 26-Aug-2014
    • (2014)Dynamic Scheduling for Reduced Energy in Configuration-Subsetted Heterogeneous Multicore SystemsProceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing10.1109/EUC.2014.12(17-24)Online publication date: 26-Aug-2014
    • (2013)A survey on cache tuning from a power/energy perspectiveACM Computing Surveys10.1145/2480741.248074945:3(1-49)Online publication date: 3-Jul-2013
    • (2013)A Cache Tuning Heuristic for Multicore ArchitecturesIEEE Transactions on Computers10.1109/TC.2013.4462:8(1570-1583)Online publication date: 1-Aug-2013
    • (2012)Loop instruction caching for energy-efficient embedded multitasking processors2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia10.1109/ESTIMedia.2012.6507036(97-106)Online publication date: Oct-2012
    • (2010)Cache topology aware computation mapping for multicoresACM SIGPLAN Notices10.1145/1809028.180660545:6(74-85)Online publication date: 5-Jun-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media