research-article

ALTER: exploiting breakable dependences for parallelization

Authors:
Abhishek Udupa

University of Pennsylvania, Philadelphia, PA, USA

University of Pennsylvania, Philadelphia, PA, USA
View Profile

,
Kaushik Rajan

Microsoft Research India, Bangalore, India

Microsoft Research India, Bangalore, India
View Profile

,
William Thies

Microsoft Research India, Bangalore, India

Microsoft Research India, Bangalore, India
View Profile

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2011Pages 480–491https://doi.org/10.1145/1993498.1993555

Published:04 June 2011Publication History

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 480–491

ABSTRACT

For decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can only be exploited by changing or re-ordering the dependences in the program.

This paper presents Alter: a system for identifying and enforcing parallelism that violates certain dependences while preserving overall program functionality. Based on programmer annotations, Alter exploits new parallelism in loops by reordering iterations or allowing stale reads. Alter can also infer which annotations are likely to benefit the program by using a test-driven framework.

Our evaluation of Alter demonstrates that it uncovers parallelism that is beyond the reach of existing static and dynamic tools. Across a selection of 12 performance-intensive loops, 9 of which have loop-carried dependences, Alter obtains an average speedup of 2.0x on 4 cores.

References

The Parallel Dwarfs Project. http://paralleldwarfs.codeplex.com.Google Scholar
F. Aleen and N. Clark. Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In ASPLOS, 2009. Google ScholarDigital Library
P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An Adaptive, Generic Parallel C++ Library. LNCS, 2624:195--210, 2003. Google ScholarDigital Library
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. CACM, 52(10), 2009. Google ScholarDigital Library
E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS, 2000. Google ScholarDigital Library
E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009. Google ScholarDigital Library
M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO, 2007. Google ScholarDigital Library
S. Burckhardt, A. Baldassion, and D. Leijen. Concurrent Programming with Revisions and Isolation Types. In OOPSLA, 2010. Google ScholarDigital Library
M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. SIGPLAN Notices, 2004. Google ScholarDigital Library
M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University, June 1996. Google ScholarDigital Library
T. Chen, F. Min, V. Nagarajan, and R. Gupta. Copy or Discard Execution Model for Speculative Parallelization on Multicores. In MICRO, 2008. Google ScholarDigital Library
A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 1st edition, 2000. Google ScholarDigital Library
K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, UC Berkeley, December 2009. Google ScholarDigital Library
R. Dias, J. Seco, and J. Lourenco. Snapshot isolation anomalies detection in software transactional memory. In INForum, 2010.Google Scholar
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. In PLDI, 2007. Google ScholarDigital Library
R. Guerraoui and M. Kapalka. On the Correctness of Transactional Memory. In PPoPP, 2008. Google ScholarDigital Library
M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 29:84--89, 1996. Google ScholarDigital Library
M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993. Google ScholarDigital Library
W. Hwu, S. Ryoo, S. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In DAC, 2007. Google ScholarDigital Library
K. Kennedy and J. Allen. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarDigital Library
V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, 48(9), 1999. Google ScholarDigital Library
M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A Suite of Parallel Irregular Programs. In ISPASS, 2009.Google ScholarCross Ref
M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS, 2008. Google ScholarDigital Library
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007. Google ScholarDigital Library
J. Larus and R. Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007. Google ScholarDigital Library
D. Maydan, S. Amarasinghe, and M. Lam. Array data-flow analysis and its use in array privatization. In POPL, 1993. Google ScholarDigital Library
M. Mehrara, J. Hao, P. Hsu, and S. Mahlke. Parallelizing Sequential applications on Commodity Hardware using a Low-cost Software Transactional Memory. In PLDI, 2009. Google ScholarDigital Library
C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.Google Scholar
S. Misailovic, D. Kim, and M. Rinard. Parallelizing Sequential Programs With Statistical Accuracy Tests. Technical Report MIT-CSAIL-TR-2010-038, MIT, Aug 2010.Google Scholar
S. Moon and M. W. Hall. Evaluation of Predicated Array Data-flow Analysis for Automatic Parallelization. In PPoPP, 1999. Google ScholarDigital Library
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google Scholar
W. Pugh and D. Wonnacott. Constraint-based array dependence analysis. ACM TOPLAS., 1998. Google ScholarDigital Library
E. Raman, N. Vachharajani, R. Rangan, and D. August. SPICE: Speculative Parallel Iteration Chunk Execution. In CGO, 2008. Google ScholarDigital Library
L. Rauchwerger and D. Padua. The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization. In ICS, 1994. Google ScholarDigital Library
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In PLDI, 1995. Google ScholarDigital Library
T. Reps. Undecidability of context-sensitive data-independence analysis. ACM TOPLAS, 2000. Google ScholarDigital Library
T. Riegel, C. Fetzer, and P. Felber. Snapshot isolation for software transactional memory. In TRANSACT, 2006.Google Scholar
M. Rinard and P. Diniz. Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers. ACM TOPLAS, 19(6), 1997. Google ScholarDigital Library
G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In ISCA, 1995. Google ScholarDigital Library
R. Tarjan. A Unified Approach to Path Problems. J. ACM, 28(3), 1981. Google ScholarDigital Library
W. Thies, V. Chandrasekhar, and S. Amarasinghe. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In MICRO, 2007. Google ScholarDigital Library
C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In PLDI, 2010. Google ScholarDigital Library
G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. PLDI, 2009. Google ScholarDigital Library
N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In PACT, 2007. Google ScholarDigital Library
H. Vandierendonck, S. Rul, and K. Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In PACT, 2010. Google ScholarDigital Library

Index Terms

ALTER: exploiting breakable dependences for parallelization
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
    2. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

ALTER: exploiting breakable dependences for parallelization
PLDI '11

For decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can ...
Read More
Speculative separation for privatization and reductions
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic ...
Read More
Parallel-stage decoupled software pipelining
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization

In recent years, the microprocessor industry has embraced chip multiprocessors (CMPs), also known as multi-core architectures, as the dominant design paradigm. For existing and new applications to make effective use of CMPs, it is desirable that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign
ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic parallelization
dependences
deterministic
program annotations
software transactional memory
speculation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 524
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ALTER: exploiting breakable dependences for parallelization

PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

ALTER: exploiting breakable dependences for parallelization

Speculative separation for privatization and reductions

Parallel-stage decoupled software pipelining