research-article

Sambamba: runtime adaptive parallel execution

Authors:

Clemens Hammacher,

Andreas Zeller,

Sebastian HackAuthors Info & Claims

ADAPT '13: Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems

Article No.: 7, Pages 1 - 6

https://doi.org/10.1145/2484904.2484911

Published: 22 January 2013 Publication History

Abstract

How can we exploit a microprocessor as efficiently as possible? The "classic" approach is static optimization at compile-time, conservatively optimizing a program while keeping all possible uses in mind. Further optimization can only be achieved by anticipating the actual usage profile: If we know, for instance, that two computations will be independent, we can run them in parallel. However, brute force parallelization may slow down execution due to its large overhead. But as this depends on runtime features, such as structure and size of input data, parallel execution needs to dynamically adapt to the runtime situation at hand.

Our SAMBAMBA framework implements such a dynamic adaptation for regular sequential C programs through adaptive dispatch between sequential and parallel function instances. In an evaluation of 14 programs, we show that automatic parallelization in combination with adaptive dispatch can lead to speed-ups of up to 5.2 fold on a quad-core machine with hyperthreading. At this point, we rely on programmer annotations but will get rid of this requirement as the platform evolves to support efficient speculative optimizations.

References

[1]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '95, pages 207--216, New York, NY, USA, 1995. ACM.

Digital Library

[2]

K.-F. Faxen, K. Popov, L. Albertsson, and S. Janson. Embla - data dependence profiling for parallel programming. In International Conference on Complex, Intelligent and Software Intensive Systems, CISIS '08, pages 780--785, 2008.

Digital Library

[3]

P. Felber, C. Fetzer, P. Marlier, and T. Riegel. Time-based software transactional memory. IEEE Transactions on Parallel and Distributed Systems, 21:1793--1807, 2010.

Digital Library

[4]

J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, July 1987.

Digital Library

[5]

C. Hammacher, K. Streit, S. Hack, and A. Zeller. Profiling java programs for parallelism. In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, IWMSE '09, pages 49--55. IEEE Computer Society, 2009.

Digital Library

[6]

M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th annual international symposium on computer architecture, ISCA '93, pages 289--300, New York, NY, USA, 1993. ACM.

Digital Library

[7]

C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis and transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 75--86, 2004.

Digital Library

[8]

C. Lattner, A. Lenharth, and V. Adve. Making context-sensitive points-to analysis with heap cloning practical for the real world. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 278--289, New York, NY, USA, 2007. ACM.

Digital Library

[9]

J. Mak and A. Mycroft. Critical-path-guided interactive parallelisation. In Proceedings of the 2011 40th International Conference on Parallel Processing Workshops, ICPPW '11, pages 427--436. IEEE Computer Society, 2011.

Digital Library

[10]

A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: a system for flexible parallel execution. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 133--144, New York, NY, USA, 2012. ACM.

Digital Library

[11]

E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, CGO '08, pages 114--123, New York, NY, USA, 2008. ACM.

Digital Library

[12]

V. Sarkar. Automatic partitioning of a program dependence graph into parallel tasks. IBM J. Res. Dev., 35(5--6):779--804, Sept. 1991.

Digital Library

[13]

B. Steensgaard. Sequentializing program dependence graphs for irreducible programs. Technical report, Microsoft Research, Oct. 1993.

[14]

K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: Framework for adaptive program optimization. http://www.sambamba.org.

[15]

K. Streit, C. Hammacher, A. Zeller, and S. Hack. Sambamba: A run-time system for online adaptive parallelization. In M. O'Boyle, editor, Compiler Construction, volume 7210 of Lecture Notes in Computer Science, pages 240--243. Springer Berlin Heidelberg, 2012.

Digital Library

[16]

T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani. A dynamic optimization framework for a java just-in-time compiler. In Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOP-SLA '01, pages 180--195, New York, NY, USA, 2001. ACM.

Digital Library

[17]

M. Wolfe. Doany: Not just another parallel loop. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, volume 757 of Lecture Notes in Computer Science, pages 421--433. Springer Berlin Heidelberg, 1993.

Digital Library

Cited By

Solmundson KBowman JManseau MTaylor RKeobouasone SWilson P(2023)Genomic population structure and inbreeding history of Lake Superior caribouEcology and Evolution10.1002/ece3.1027813:7Online publication date: 7-Jul-2023
https://doi.org/10.1002/ece3.10278
Saad MPalmieri RRavindran B(2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
https://dl.acm.org/doi/10.1145/3310368
Saad MKishi MJing SHans SPalmieri RHollingsworth JKeidar I(2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295730
Show More Cited By

Index Terms

Sambamba: runtime adaptive parallel execution
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Sambamba: a runtime system for online adaptive parallelization
CC'12: Proceedings of the 21st international conference on Compiler Construction

How can we exploit a microprocessor as efficiently as possible? The "classic" approach is static optimization at compile-time, optimizing a program for all possible uses. Further optimization can only be achieved by anticipating the actual usage profile:...
A comparison of automatic parallelization tools/compilers on the SGI origin 2000
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is time consuming and costly, porting codes would ideally be automated by using some parallelization ...
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator
PLDI '11

Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ADAPT '13: Proceedings of the 3rd International Workshop on Adaptive Self-Tuning Computing Systems

January 2013

31 pages

ISBN:9781450320221

DOI:10.1145/2484904

Conference Chairs:
Christophe Dubach
University of Edinburgh, UK
,
Grigori Fursin
INRIA, France

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

NVIDIA
Microsoft Research: Microsoft Research

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Bundesministerium für Bildung und Forschung

Conference

ADAPT '13

Sponsor:

Microsoft Research

ADAPT '13: 3rd International Workshop on Adaptive Self-tuning Computing Systems

January 22, 2013

Berlin, Germany

Acceptance Rates

ADAPT '13 Paper Acceptance Rate 7 of 13 submissions, 54%;

Overall Acceptance Rate 14 of 30 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
168
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Solmundson KBowman JManseau MTaylor RKeobouasone SWilson P(2023)Genomic population structure and inbreeding history of Lake Superior caribouEcology and Evolution10.1002/ece3.1027813:7Online publication date: 7-Jul-2023
https://doi.org/10.1002/ece3.10278
Saad MPalmieri RRavindran B(2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
https://dl.acm.org/doi/10.1145/3310368
Saad MKishi MJing SHans SPalmieri RHollingsworth JKeidar I(2019)Processing transactions in a predefined orderProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295730(120-132)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295730
Saad MPalmieri RRavindran BBreitgand DYadgar GPorter DEyal I(2018)LernaProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211897(37-48)Online publication date: 4-Jun-2018
https://dl.acm.org/doi/10.1145/3211890.3211897
Sukumaran-Rajam AClauss P(2015)The Polyhedral Model of Nonlinear LoopsACM Transactions on Architecture and Code Optimization10.1145/283873412:4(1-27)Online publication date: 8-Dec-2015
https://dl.acm.org/doi/10.1145/2838734
Streit KDoerfert JHammacher CZeller AHack S(2015)Generalized Task ParallelismACM Transactions on Architecture and Code Optimization10.1145/272316412:1(1-25)Online publication date: 2-Apr-2015
https://dl.acm.org/doi/10.1145/2723164
Molitorisz KMüller TTichy W(2015)PattyProceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/2712386.2712392(153-163)Online publication date: 7-Feb-2015
https://dl.acm.org/doi/10.1145/2712386.2712392
Rajam ACampostrini LCaamaño JClauss P(2015)Speculative Runtime Parallelization of Loop NestsProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop10.1109/IPDPSW.2015.10(245-254)Online publication date: 25-May-2015
https://dl.acm.org/doi/10.1109/IPDPSW.2015.10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten