skip to main content
10.1145/2484904.2484911acmotherconferencesArticle/Chapter ViewAbstractPublication PagesadaptConference Proceedingsconference-collections
research-article

Sambamba: runtime adaptive parallel execution

Published: 22 January 2013 Publication History

Abstract

How can we exploit a microprocessor as efficiently as possible? The "classic" approach is static optimization at compile-time, conservatively optimizing a program while keeping all possible uses in mind. Further optimization can only be achieved by anticipating the actual usage profile: If we know, for instance, that two computations will be independent, we can run them in parallel. However, brute force parallelization may slow down execution due to its large overhead. But as this depends on runtime features, such as structure and size of input data, parallel execution needs to dynamically adapt to the runtime situation at hand.
Our SAMBAMBA framework implements such a dynamic adaptation for regular sequential C programs through adaptive dispatch between sequential and parallel function instances. In an evaluation of 14 programs, we show that automatic parallelization in combination with adaptive dispatch can lead to speed-ups of up to 5.2 fold on a quad-core machine with hyperthreading. At this point, we rely on programmer annotations but will get rid of this requirement as the platform evolves to support efficient speculative optimizations.

References

[1]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '95, pages 207--216, New York, NY, USA, 1995. ACM.
[2]
K.-F. Faxen, K. Popov, L. Albertsson, and S. Janson. Embla - data dependence profiling for parallel programming. In International Conference on Complex, Intelligent and Software Intensive Systems, CISIS '08, pages 780--785, 2008.
[3]
P. Felber, C. Fetzer, P. Marlier, and T. Riegel. Time-based software transactional memory. IEEE Transactions on Parallel and Distributed Systems, 21:1793--1807, 2010.
[4]
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, July 1987.
[5]
C. Hammacher, K. Streit, S. Hack, and A. Zeller. Profiling java programs for parallelism. In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, IWMSE '09, pages 49--55. IEEE Computer Society, 2009.
[6]
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th annual international symposium on computer architecture, ISCA '93, pages 289--300, New York, NY, USA, 1993. ACM.
[7]
C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis and transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 75--86, 2004.
[8]
C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysis with heap cloning practical for the real world. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 278--289, New York, NY, USA, 2007. ACM.
[9]
J. Mak and A. Mycroft. Critical-path-guided interactive parallelisation. In Proceedings of the 2011 40th International Conference on Parallel Processing Workshops, ICPPW '11, pages 427--436. IEEE Computer Society, 2011.
[10]
A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: a system for flexible parallel execution. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 133--144, New York, NY, USA, 2012. ACM.
[11]
E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO '08, pages 114--123, New York, NY, USA, 2008. ACM.
[12]
V. Sarkar. Automatic partitioning of a program dependence graph into parallel tasks. IBM J. Res. Dev., 35(5--6):779--804, Sept. 1991.
[13]
B. Steensgaard. Sequentializing program dependence graphs for irreducible programs. Technical report, Microsoft Research, Oct. 1993.
[14]
K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: Framework for adaptive program optimization. http://www.sambamba.org.
[15]
K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: A run-time system for online adaptive parallelization. In M. O'Boyle, editor, Compiler Construction, volume 7210 of Lecture Notes in Computer Science, pages 240--243. Springer Berlin Heidelberg, 2012.
[16]
T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani. A dynamic optimization framework for a java just-in-time compiler. In Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOP-SLA '01, pages 180--195, New York, NY, USA, 2001. ACM.
[17]
M. Wolfe. Doany: Not just another parallel loop. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 757 of Lecture Notes in Computer Science, pages 421--433. Springer Berlin Heidelberg, 1993.

Cited By

View all
  • (2023)Genomic population structure and inbreeding history of Lake Superior caribouEcology and Evolution10.1002/ece3.1027813:7Online publication date: 7-Jul-2023
  • (2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
  • (2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ADAPT '13: Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems
January 2013
31 pages
ISBN:9781450320221
DOI:10.1145/2484904
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • NVIDIA
  • Microsoft Research: Microsoft Research

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic parallelization
  2. just-in-time compilation
  3. parallelization

Qualifiers

  • Research-article

Funding Sources

Conference

ADAPT '13
Sponsor:
  • Microsoft Research

Acceptance Rates

ADAPT '13 Paper Acceptance Rate 7 of 13 submissions, 54%;
Overall Acceptance Rate 14 of 30 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Genomic population structure and inbreeding history of Lake Superior caribouEcology and Evolution10.1002/ece3.1027813:7Online publication date: 7-Jul-2023
  • (2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
  • (2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
  • (2018)LernaProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211897(37-48)Online publication date: 4-Jun-2018
  • (2015)The Polyhedral Model of Nonlinear LoopsACM Transactions on Architecture and Code Optimization10.1145/283873412:4(1-27)Online publication date: 8-Dec-2015
  • (2015)Generalized Task ParallelismACM Transactions on Architecture and Code Optimization10.1145/272316412:1(1-25)Online publication date: 2-Apr-2015
  • (2015)PattyProceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/2712386.2712392(153-163)Online publication date: 7-Feb-2015
  • (2015)Speculative Runtime Parallelization of Loop NestsProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop10.1109/IPDPSW.2015.10(245-254)Online publication date: 25-May-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media