skip to main content
10.1145/1815961.1815996acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

Published:19 June 2010Publication History

ABSTRACT

Extracting performance from modern parallel architectures requires that applications be divided into many different threads of execution. Unfortunately selecting the appropriate number of threads for an application is a daunting task. Having too many threads can quickly saturate shared resources, such as cache capacity or memory bandwidth, thus degrading performance. On the other hand, having too few threads makes inefficient use of the resources available. Beyond static resource assignment, the program inputs and dynamic system state (e.g., what other applications are executing in the system) can have a significant impact on the right number of threads to use for a particular application.

To address this problem we present the Thread Tailor, a dynamic system that automatically adjusts the number of threads in an application to optimize system efficiency. The Thread Tailor leverages offline analysis to estimate what type of threads will exist at runtime and the communication patterns between them. Using this information Thread Tailor dynamically combines threads to better suit the needs of the target system. Thread Tailor adjusts not only to the architecture, but also other applications in the system, and this paper demonstrates that this type of adjustment can lead to significantly better use of thread-level parallelism in real-world architectures.

References

  1. L. O. Andersen. Program analysis and specialization for the c programming language. Technical report, University of Copenhagen, 1994.Google ScholarGoogle Scholar
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Chu, R. Ravindran, and S. Mahlke. Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures. In Proc. of the 40th Annual International Symposium on Microarchitecture, pages 369--378, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Corbalan, X. Martorell, and J. Labarta. Performance-driven processor allocation. In Proc. of the 2000 International Symposium on on Operating Systems Design and Implementation, pages 255--266, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In Proc. of the 2006 International Conference on Supercomputing, pages 157--166, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. IEEE Computer Science and Engineering, 5(1):46--55, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Ding, M. Kandemir, P. Raghavan, and M. Irwin. A helper thread based edp reduction scheme for adapting application execution in cmps. In Proc. of the 2008 IEEE Symposium on on Parallel and Distributed Processing, pages 1--14, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Fiduccia and R. Mattheyses. A linear time heuristic for improving network partitions. In Proc. of the 19th Design Automation Conference, pages 175--181, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Ghuloum. Ct: C for throughput computing (whitepaper), 2009. http://techresearch.intel.com/articles/Tera-Scale/1514.htm.Google ScholarGoogle Scholar
  11. M. Hirzel, D. von Dincklage, A. Diwan, and M. Hind. Fast online pointer analysis. ACM Transactions on Programming Languages and Systems, 29(2):11, Apr. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Jung, D. Lim, J. Lee, and S. Han. Adaptive execution techniques for SMT multiprocessor architectures. In Proc. of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 236--246, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291--207, Feb. 1970.Google ScholarGoogle ScholarCross RefCross Ref
  14. M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proc. of the SIGPLAN '08 Conference on Programming Language Design and Implementation, pages 114--124, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kulkarni et al. Optimistic Parallelism Requires Abstractions. In Proc. of the SIGPLAN '07 Conference on Programming Language Design and Implementation, pages 211--222, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kumar, G. Agrawal, and G. Gao. Compiling several classes of communication patterns on a multithreaded architecture. In Proc. of the 2002 IEEE Symposium on on Parallel and Distributed Processing, pages 18--23, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kumar et al. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In Proc. of the 31st Annual International Symposium on Computer Architecture, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In Proc. of the 36th Annual International Symposium on Microarchitecture, pages 81--92, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Lakshiminarayana, S. Rao, and H. Kim. Asymmetricity Aware Scheduling Algorithms for Asymmetric Processors. In Workshop on the Interaction between Operating Systems and Computer Architecture, 2009.Google ScholarGoogle Scholar
  20. C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proc. of the 2004 International Symposium on Code Generation and Optimization, pages 75--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Li and J. Martinez. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proc. of the 12th International Symposium on High-Performance Computer Architecture, pages 77--87, 2006.Google ScholarGoogle Scholar
  22. U. Nawathe et al. An 8-core, 64-thread, 64-bit, power efficient SPARC SoC (Niagara2), Feb. 2007. In Proc. of ISSCC.Google ScholarGoogle Scholar
  23. J. Nieplocha et al. Evaluating the potential of multithreaded platforms for irregular scientific computations. In Proc. of the 2007 ACM Conference on Computing Frontiers, pages 47--58, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nvidia. CUDA Programming Guide, June 2007. http://developer.download.nvidia.com/compute/cuda.Google ScholarGoogle Scholar
  25. K. Olukotun et al. The case for a single chip multiprocessor. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pages 2--11, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Qureshi and Y. Patt. Partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. of the 39th Annual International Symposium on Microarchitecture, pages 423--432, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Ravindran, R. Senger, E. Marsman, G. Dasika, M. Guthaus, S. Mahlke, and R. Brown. Increasing the Number of Effective Registers in a Low-Power Processor Using a Windowed Register File. In Proc. of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 125--136, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Schwartz, D. Marpe, and T. Wiegand. Overview of the scalable video coding extension of the h.264/avc standard. IEEE Transactions on Circuits and Systems for Video Technology, 17(9):1103--1120, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. A. Suleman, M. Qureshi, and Y. Patt. Feedback Driven Threading: Power-Efficient and High-Performance Execution of Multithreaded Workloads on CMPs. In 16th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 277--286, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Wang et al. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proc. of the SIGPLAN '07 Conference on Programming Language Design and Implementation, pages 156--166, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd annual international symposium on Computer architecture, pages 24--36, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Q. Wu et al. A dynamic compilation framework for controlling microprocessor energy and performance. In Proc. of the 38th Annual International Symposium on Microarchitecture, pages 271--282, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Xie and G. H. Loh. PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches. In Proc. of the 36th Annual International Symposium on Computer Architecture, pages 174--183, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
            June 2010
            520 pages
            ISBN:9781450300537
            DOI:10.1145/1815961
            • cover image ACM SIGARCH Computer Architecture News
              ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
              ISCA '10
              June 2010
              508 pages
              ISSN:0163-5964
              DOI:10.1145/1816038
              Issue’s Table of Contents

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate543of3,203submissions,17%

            Upcoming Conference

            ISCA '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader