research-article

Speculative parallelization using software multi-threaded transactions

Authors:
Arun Raman

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

,
Hanjun Kim

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

,
Thomas R. Mason

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

,
Thomas B. Jablin

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

,
David I. August

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsMarch 2010Pages 65–76https://doi.org/10.1145/1736020.1736030

Published:13 March 2010Publication History

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

Pages 65–76

ABSTRACT

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative techniques, speculative parallelism appears to be the key to continuing this trend for general-purpose applications. Recently-proposed code parallelization techniques, such as those by Bridges et al. and by Thies et al., demonstrate scalable performance on multiple cores by using speculation to divide code into atomic units (transactions) that span multiple threads in order to expose data parallelism. Unfortunately, most software and hardware Thread-Level Speculation (TLS) memory systems and transactional memories are not sufficient because they only support single-threaded atomic units. Multi-threaded Transactions (MTXs) address this problem, but they require expensive hardware support as currently proposed in the literature. This paper proposes a Software MTX (SMTX) system that captures the applicability and performance of hardware MTX, but on existing multicore machines. The SMTX system yields a harmonic mean speedup of 13.36x on native hardware with four 6-core processors (24 cores in total) running speculatively parallelized applications.

References

M. Abadi, T. Harris, and M. Mehrara. Transactional memory with strong atomicity using off-the-shelf memory protection hardware. In PPoPP '09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 185--196. ACM, 2009. Google ScholarDigital Library
A.-R. Adl-Tabatabai, B. T. Lewis, V. Menon, B. R. Murphy, B. Saha, and T. Shpeisman. Compiler and runtime support for efficient software transactional memory. In PLDI '06: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 26--37. ACM, June 2006. Google ScholarDigital Library
R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarDigital Library
E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In OOPSLA '09: Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, pages 81--96. ACM, 2009. Google ScholarDigital Library
A. Bhowmik and M. Franklin. A general compiler framework for speculative multithreading. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures, pages 99--108, August 2002. Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 72--81. ACM, 2008. Google ScholarDigital Library
M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO'07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 69--84. IEEE Computer Society, 2007. Google ScholarDigital Library
M. J. Bridges. The VELOCITY Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008. Google ScholarDigital Library
C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Queue, 6(5):46--58, 2008. Google ScholarDigital Library
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234. ACM, 2007. Google ScholarDigital Library
M. J. Garzarán, M. Prvulovic, J. M. Llabería, V. Vi Ünals, L. Rauchwerger, and J. Torrellas. Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors. ACM Transactions on Architecture Code Optimization, 2(3):247--279, 2005. Google ScholarDigital Library
J. Giacomoni, T. Moseley, and M. Vachharajani. FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 43--52. ACM, February 2008. Google ScholarDigital Library
GNU Image Manipulation Program. http://www.gimp.org. K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast Track: A software system for speculative program optimization. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 157--168. IEEE Computer Society, May 2009. Google ScholarDigital Library
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 211--222. ACM, 2007. Google ScholarDigital Library
J. Larus and R. Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007. Google ScholarDigital Library
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, page 75. IEEE Computer Society, 2004. Google ScholarDigital Library
M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI '09: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 166--176. ACM, 2009. Google ScholarDigital Library
C. E. Oancea and A. Mycroft. Software thread-level speculation: an optimistic library implementation. In IWMSE '08: Proceedings of the 1st International Workshop on Multicore Software Engineering, pages 23--32. ACM, 2008. Google ScholarDigital Library
M. Olszewski, J. Cutler, and J. G. Steffan. JudoSTM: A dynamic binary-rewriting approach to software transactional memory. In PACT'07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 365--375. IEEE Computer Society, 2007. Google ScholarDigital Library
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO '05: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118. IEEE Computer Society, 2005. Google ScholarDigital Library
E. Raman. Parallelization Techniques with Improved Dependence Handling. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, June 2009. Google ScholarDigital Library
E. Raman, N. Vachharajani, R. Rangan, and D. I. August. Spice: speculative parallel iteration chunk execution. In CGO '08: Proceedings of the 2008 International Symposium on Code Generation and Optimization, pages 175--184. ACM, 2008. Google ScholarDigital Library
L. Rauchwerger and D. A. Padua. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 10(2):160---180, 1999. Google ScholarDigital Library
P. Rundberg and P. Stenstrom. An all-software thread-level data dependence speculation system for multiprocessors. Journal of Instruction-Level Parallelism, 3, October 2001.Google Scholar
N. Shavit and D. Touitou. Software transactional memory. In PODC'95: Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing, pages 204--213. ACM, 1995. Google ScholarDigital Library
Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.Google Scholar
J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23(3):253--300, February 2005. Google ScholarDigital Library
W. Thies, V. Chandrasekhar, and S. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 356--369. IEEE Computer Society, 2007. Google ScholarDigital Library
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In MICRO '08: Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pages 330--341. IEEE Computer Society, 2008. Google ScholarDigital Library
N. Vachharajani. Intelligent Speculation for Pipelined Multithreading. PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States, November 2008. Google ScholarDigital Library
N. Vachharajani, R. Rangan, E. Raman, M. J. Bridges, G. Ottoni, and D. I. August. Speculative decoupled software pipelining. In PACT'07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 49--59. IEEE Computer Society, 2007. Google ScholarDigital Library
H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In HPCA '08: Proceedings of the 14th International Symposium on High-Performance Computer Architecture, 2008.Google Scholar
C. Zilles and G. Sohi. Master/slave speculative parallelization. In MICRO'02: Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pages 85--96. IEEE Computer Society Press, 2002. Google ScholarDigital Library

Index Terms

Speculative parallelization using software multi-threaded transactions
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

Speculative parallelization using software multi-threaded transactions
ASPLOS '10

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Read More
Speculative parallelization using software multi-threaded transactions
ASPLOS '10

With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades. While many scientific programs can be parallelized without speculative ...
Read More
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
March 2010
422 pages
ISBN:9781605588391
DOI:10.1145/1736020
General Chair:
James C. Hoe
Carnegie Mellon University, USA
,
Program Chair:
Vikram S. Adve
University of Illinois at Urbana-Champaign, USA
ACM SIGARCH Computer Architecture News Volume 38, Issue 1
ASPLOS '10
March 2010
399 pages
ISSN:0163-5964
DOI:10.1145/1735970
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 45, Issue 3
ASPLOS '10
March 2010
399 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1735971
Issue’s Table of Contents
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic parallelization
loop-level parallelism
multi-threaded transactions
pipelined parallelism
software transactional memory
thread-level speculation
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS XV Paper Acceptance Rate32of181submissions,18%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 86
  Total Citations
  View Citations
- 1,108
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Speculative parallelization using software multi-threaded transactions

ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Speculative parallelization using software multi-threaded transactions

Speculative parallelization using software multi-threaded transactions

Scalable Speculative Parallelization on Commodity Clusters