ABSTRACT
For decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can only be exploited by changing or re-ordering the dependences in the program.
This paper presents Alter: a system for identifying and enforcing parallelism that violates certain dependences while preserving overall program functionality. Based on programmer annotations, Alter exploits new parallelism in loops by reordering iterations or allowing stale reads. Alter can also infer which annotations are likely to benefit the program by using a test-driven framework.
Our evaluation of Alter demonstrates that it uncovers parallelism that is beyond the reach of existing static and dynamic tools. Across a selection of 12 performance-intensive loops, 9 of which have loop-carried dependences, Alter obtains an average speedup of 2.0x on 4 cores.
- The Parallel Dwarfs Project. http://paralleldwarfs.codeplex.com.Google Scholar
- F. Aleen and N. Clark. Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In ASPLOS, 2009. Google ScholarDigital Library
- P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An Adaptive, Generic Parallel C++ Library. LNCS, 2624:195--210, 2003. Google ScholarDigital Library
- K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. CACM, 52(10), 2009. Google ScholarDigital Library
- E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS, 2000. Google ScholarDigital Library
- E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009. Google ScholarDigital Library
- M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO, 2007. Google ScholarDigital Library
- S. Burckhardt, A. Baldassion, and D. Leijen. Concurrent Programming with Revisions and Isolation Types. In OOPSLA, 2010. Google ScholarDigital Library
- M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. SIGPLAN Notices, 2004. Google ScholarDigital Library
- M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University, June 1996. Google ScholarDigital Library
- T. Chen, F. Min, V. Nagarajan, and R. Gupta. Copy or Discard Execution Model for Speculative Parallelization on Multicores. In MICRO, 2008. Google ScholarDigital Library
- A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 1st edition, 2000. Google ScholarDigital Library
- K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, UC Berkeley, December 2009. Google ScholarDigital Library
- R. Dias, J. Seco, and J. Lourenco. Snapshot isolation anomalies detection in software transactional memory. In INForum, 2010.Google Scholar
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. In PLDI, 2007. Google ScholarDigital Library
- R. Guerraoui and M. Kapalka. On the Correctness of Transactional Memory. In PPoPP, 2008. Google ScholarDigital Library
- M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 29:84--89, 1996. Google ScholarDigital Library
- M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993. Google ScholarDigital Library
- W. Hwu, S. Ryoo, S. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In DAC, 2007. Google ScholarDigital Library
- K. Kennedy and J. Allen. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarDigital Library
- V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, 48(9), 1999. Google ScholarDigital Library
- M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A Suite of Parallel Irregular Programs. In ISPASS, 2009.Google ScholarCross Ref
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS, 2008. Google ScholarDigital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007. Google ScholarDigital Library
- J. Larus and R. Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007. Google ScholarDigital Library
- D. Maydan, S. Amarasinghe, and M. Lam. Array data-flow analysis and its use in array privatization. In POPL, 1993. Google ScholarDigital Library
- M. Mehrara, J. Hao, P. Hsu, and S. Mahlke. Parallelizing Sequential applications on Commodity Hardware using a Low-cost Software Transactional Memory. In PLDI, 2009. Google ScholarDigital Library
- C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.Google Scholar
- S. Misailovic, D. Kim, and M. Rinard. Parallelizing Sequential Programs With Statistical Accuracy Tests. Technical Report MIT-CSAIL-TR-2010-038, MIT, Aug 2010.Google Scholar
- S. Moon and M. W. Hall. Evaluation of Predicated Array Data-flow Analysis for Automatic Parallelization. In PPoPP, 1999. Google ScholarDigital Library
- W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google Scholar
- W. Pugh and D. Wonnacott. Constraint-based array dependence analysis. ACM TOPLAS., 1998. Google ScholarDigital Library
- E. Raman, N. Vachharajani, R. Rangan, and D. August. SPICE: Speculative Parallel Iteration Chunk Execution. In CGO, 2008. Google ScholarDigital Library
- L. Rauchwerger and D. Padua. The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization. In ICS, 1994. Google ScholarDigital Library
- L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In PLDI, 1995. Google ScholarDigital Library
- T. Reps. Undecidability of context-sensitive data-independence analysis. ACM TOPLAS, 2000. Google ScholarDigital Library
- T. Riegel, C. Fetzer, and P. Felber. Snapshot isolation for software transactional memory. In TRANSACT, 2006.Google Scholar
- M. Rinard and P. Diniz. Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers. ACM TOPLAS, 19(6), 1997. Google ScholarDigital Library
- G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In ISCA, 1995. Google ScholarDigital Library
- R. Tarjan. A Unified Approach to Path Problems. J. ACM, 28(3), 1981. Google ScholarDigital Library
- W. Thies, V. Chandrasekhar, and S. Amarasinghe. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In MICRO, 2007. Google ScholarDigital Library
- C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In PLDI, 2010. Google ScholarDigital Library
- G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. PLDI, 2009. Google ScholarDigital Library
- N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In PACT, 2007. Google ScholarDigital Library
- H. Vandierendonck, S. Rul, and K. Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In PACT, 2010. Google ScholarDigital Library
Index Terms
- ALTER: exploiting breakable dependences for parallelization
Recommendations
ALTER: exploiting breakable dependences for parallelization
PLDI '11For decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can ...
Speculative separation for privatization and reductions
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationAutomatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structures introduce artificial constraints that obstruct automatic ...
Parallel-stage decoupled software pipelining
CGO '08: Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimizationIn recent years, the microprocessor industry has embraced chip multiprocessors (CMPs), also known as multi-core architectures, as the dominant design paradigm. For existing and new applications to make effective use of CMPs, it is desirable that ...
Comments