skip to main content
10.1145/2854038.2854047acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Portable performance on asymmetric multicore processors

Authors Info & Claims
Published:29 February 2016Publication History

ABSTRACT

Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load balance, and the critical path. Applying these criteria effectively is challenging especially for complex and non-scalable multithreaded applications. We demonstrate that runtimes for managed languages, which are now ubiquitous, provide a unique opportunity to abstract over AMP complexity and inform scheduling with rich semantics such as thread priorities, locks, and parallelism— information not directly available to the hardware, OS, or application. We present the WASH AMP scheduler, which (1) automatically identifies and accelerates critical threads in concurrent, but non-scalable applications; (2) respects thread priorities; (3) considers core availability and thread sensitivity; and (4) proportionally schedules threads on big and small cores to optimize performance and energy. We introduce new dynamic analyses that identify critical threads and classify applications as sequential, scalable, or non-scalable. Compared to prior work, WASH improves performance by 20% and energy by 9% or more on frequency-scaled AMP hardware (not simulation). Performance advantages grow to 27% when asymmetry increases. Performance advantages are robust to a complex multithreaded adversary independently scheduled by the OS. WASH effectively identifies and optimizes a wider class of workloads than prior work.

References

  1. Android. Bionic platform, 2014. URL https://github.com/ android/platform_bionic.Google ScholarGoogle Scholar
  2. D. F. Bacon, R. B. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for Java. In PLDI’98, pages 258–268, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Becchi and P. Crowley. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers, pages 29–40, 2006. ISBN 1-59593-302-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA’06, pages 169–190, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In ISCA’12, pages 225–236, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ISCA’12, pages 213–224, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. V. Craeynest, S. Akram, W. Heirman, A. Jaleel, and L. Eeckhout. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. PACT’13, pages 177–187, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Du Bois, S. Eyerman, J. B. Sartor, and L. Eeckhout. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In ISCA’13, pages 511–522, Jun. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In ASPLOS, pages 319–332, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hill and M. Marty. Amdahl’s law in the multicore era. Computer, 41(7):33–38, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Bottleneck identification and scheduling in multithreaded applications. In ASPLOS 2012, pages 223–234, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Utility-based acceleration of multithreaded application on asymmetric CMPs. In ISCA’13, pages 154–165, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 81–92, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In ISCA’04, pages 64–75, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Li, D. P. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC’07, pages 1–11, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Li, D. Baumberger, and S. Hahn. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In PPoPP’09, pages 65–74, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. C. Mogul, J. Mudigonda, N. L. Binkert, P. Ranganathan, and V. Talwar. Using asymmetric single-ISA CMPs to save energy on operating systems. Micro, 28(3):26–41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Molnar. Modular Scheduler Core and Completely Fair Scheduler {CFS}. http://lwn.net/Articles/230501/, Apr. 2007.Google ScholarGoogle Scholar
  19. Qualcomm. Snapdragon 810 processors, 2014. URL https://www. qualcomm.com/products/snapdragon/processors/810.Google ScholarGoogle Scholar
  20. J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. JPDC, 71(1):114– 131, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. C. Saez, A. Fedorova, D. Koufaty, and M. Prieto. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM TCM, 30(2):6:1–38, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, pages 253–264, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. The Jikes RVM Research Group. Jikes Open-Source Research Virtual Machine, 2011. URL http://www.jikesrvm.org.Google ScholarGoogle Scholar

Index Terms

  1. Portable performance on asymmetric multicore processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization
      February 2016
      283 pages
      ISBN:9781450337786
      DOI:10.1145/2854038

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 February 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CGO '16 Paper Acceptance Rate25of108submissions,23%Overall Acceptance Rate312of1,061submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader