skip to main content
10.1145/2593069.2593240acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Variation Aware Cache Partitioning for Multithreaded Programs

Authors Info & Claims
Published:01 June 2014Publication History

ABSTRACT

Multithreaded programs are commonly written and optimized for homogeneous multi-core processors assuming equal performance from all the cores. This assumption greatly simplifies the partitioning and balancing of an application's workload across threads; however, it no longer holds when the frequencies of the cores differ due to within-die variations, leading to a degradation in performance. We observe that, in addition to the frequency of the core that it executes on, the performance of a thread is also dependent on the share of shared system resources, such as last-level cache, that it receives. We propose variation-aware cache partitioning as an approach to redress the variation-induced imbalance in the execution times of threads, thereby improving the performance of multi-threaded programs. We discuss the challenges involved in realizing our proposal, including synchronization (e.g., barriers) across threads, which results in faster threads being limited by slower threads, the complex and non-linear relationship between a thread's performance and the cache capacity allocated to it, and the fact that different program phases, can respond quite differently to varying cache capacity. We propose a runtime scheme to perform spatio-temporal cache partitioning while considering both chip characteristics (frequency variations) and program characteristics. We evaluate the proposed technique by applying it to an ensemble of variation-impacted multi-cores executing multi-threaded programs from the PARSEC and SPEC-OMP suites, and demonstrate that it results in an average performance improvement of 15% by mitigating the impact of frequency variations.

References

  1. S. Dighe et al. Within-die variation-aware dynamic voltage frequency scaling with optimal core allocation and thread hopping for the 80-core teraflops processor. Trans. JSSC, 46(1), 2011.Google ScholarGoogle Scholar
  2. J. Sartori et al. Variation-aware speed binning of multi-core processors. In Proc. ISQED, pages 307--314, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  3. G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. J. Supercomput., 28(1):7--26, April 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Guo et al. Quality of service shared cache management in chip multiprocessor architecture. ACM TACO, 7(3):14:1--14:33, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. R. Sarangi et al. VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects. IEEE Trans. Semiconductor Manufacturing, 21(1):3 --13, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Eyerman et al. A performance counter architecture for computing accurate CPI components. In Proc. ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Cormen et al. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Engblom et al. Full-system simulation from embedded to high-performance systems. In Processor and System-on-Chip Simulation, pages 25--45. Springer US, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. M. K. Martin et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Bienia et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Aslot et al. SPEComp: A new benchmark suite for measuring parallel computer performance. In Proc. WOMPAT, pages 1--10, London, UK, UK, 2001. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. K. Rangan et al. Achieving uniform performance and maximizing throughput in the presence of heterogeneity. In Proc. HPCA, pages 3--14, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Teodorescu et al. Variation-aware application scheduling and power management for chip multiprocessors. In Proc. ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Bhattacharjee et al. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In Proc. ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Herbert et al. Variation-aware dynamic voltage/frequency scaling. In Proc. HPCA, pages 301--312, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Kim et al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. PACT, pages 111--122, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Pan et al. Imbalanced cache partitioning for balanced data-parallel programs. In Proc. Micro, pages 297--309, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S.P. Muralidhara et al. Intra-application cache partitioning. In Proc. IPDPS, pages 1--12, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Kandemir et al. A helper thread based dynamic cache partitioning scheme for multithreaded applications. In Proc. DAC, pages 954--959, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Variation Aware Cache Partitioning for Multithreaded Programs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        DAC '14: Proceedings of the 51st Annual Design Automation Conference
        June 2014
        1249 pages
        ISBN:9781450327305
        DOI:10.1145/2593069

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate1,770of5,499submissions,32%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader