skip to main content
10.1145/1736020.1736030acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Speculative parallelization using software multi-threaded transactions

Published:13 March 2010Publication History

ABSTRACT

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative techniques, speculative parallelism appears to be the key to continuing this trend for general-purpose applications. Recently-proposed code parallelization techniques, such as those by Bridges et al. and by Thies et al., demonstrate scalable performance on multiple cores by using speculation to divide code into atomic units (transactions) that span multiple threads in order to expose data parallelism. Unfortunately, most software and hardware Thread-Level Speculation (TLS) memory systems and transactional memories are not sufficient because they only support single-threaded atomic units. Multi-threaded Transactions (MTXs) address this problem, but they require expensive hardware support as currently proposed in the literature. This paper proposes a Software MTX (SMTX) system that captures the applicability and performance of hardware MTX, but on existing multicore machines. The SMTX system yields a harmonic mean speedup of 13.36x on native hardware with four 6-core processors (24 cores in total) running speculatively parallelized applications.

References

  1. M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 185--196. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In PLDI '06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 26--37. ACM, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In OOPSLA '09: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, pages 81--96. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bhowmik and M. Franklin. A general compiler framework for speculative multithreading. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures, pages 99--108, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 72--81. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO'07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 69--84. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. J. Bridges. The VELOCITY Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5):46--58, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. J. Garzarán, M. Prvulovic, J. M. Llabería, V. Vi Ünals, L. Rauchwerger, and J. Torrellas. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. ACM Transactions on Architecture Code Optimization, 2(3):247--279, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 43--52. ACM, February 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. GNU Image Manipulation Program. http://www.gimp.org. K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast Track: A software system for speculative program optimization. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 157--168. IEEE Computer Society, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 211--222. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Larus and R. Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, page 75. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 166--176. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop on Multicore Software Engineering, pages 23--32. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Olszewski, J. Cutler, and J. G. Steffan. JudoSTM: A dynamic binary-rewriting approach to software transactional memory. In PACT'07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 365--375. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO '05: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Raman. Parallelization Techniques with Improved Dependence Handling. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In CGO '08: Proceedings of the 2008 International Symposium on Code Generation and Optimization, pages 175--184. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Rauchwerger and D. A. Padua. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 10(2):160---180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Rundberg and P. Stenstrom. An all-software thread-level data dependence speculation system for multiprocessors. Journal of Instruction-Level Parallelism, 3, October 2001.Google ScholarGoogle Scholar
  25. N. Shavit and D. Touitou. Software transactional memory. In PODC'95: Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, pages 204--213. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.Google ScholarGoogle Scholar
  27. J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253--300, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 356--369. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pages 330--341. IEEE Computer Society, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Vachharajani. Intelligent Speculation for Pipelined Multithreading. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In PACT'07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 49--59. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In HPCA '08: Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008.Google ScholarGoogle Scholar
  33. C. Zilles and G. Sohi. Master/slave speculative parallelization. In MICRO'02: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pages 85--96. IEEE Computer Society Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Speculative parallelization using software multi-threaded transactions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
        March 2010
        422 pages
        ISBN:9781605588391
        DOI:10.1145/1736020
        • General Chair:
        • James C. Hoe,
        • Program Chair:
        • Vikram S. Adve
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
          ASPLOS '10
          March 2010
          399 pages
          ISSN:0163-5964
          DOI:10.1145/1735970
          Issue’s Table of Contents
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 45, Issue 3
          ASPLOS '10
          March 2010
          399 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1735971
          Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPLOS XV Paper Acceptance Rate32of181submissions,18%Overall Acceptance Rate535of2,713submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader