research-article

Limits of parallelism using dynamic dependency graphs

Authors:
Jonathan Mak

University of Cambridge Computer Laboratory, Cambridge, United Kingdom

University of Cambridge Computer Laboratory, Cambridge, United Kingdom
View Profile

,
Alan Mycroft

University of Cambridge Computer Laboratory, Cambridge, United Kingdom

University of Cambridge Computer Laboratory, Cambridge, United Kingdom
View Profile

WODA '09: Proceedings of the Seventh International Workshop on Dynamic AnalysisJuly 2009Pages 42–48https://doi.org/10.1145/2134243.2134253

Published:20 July 2009Publication History

WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis

Pages 42–48

ABSTRACT

The advance of multi-core processors has led to renewed interest in extracting parallelism from programs. It is sometimes useful to know how much parallelism is exploitable in the limit for general programs, to put into perspective the speedups of various parallelisation techniques. Wall's study [19] was one of the first to examine limits of parallelism in detail. We present an extension of Wall's analysis of limits of parallelism, by constructing Dynamic Dependency Graphs from execution traces of a number of benchmark programs, allowing us better visualisation of the types of dependencies which limit parallelism, as well as flexibility in transforming graphs when exploring possible optimisations. Some of the results of Wall and subsequent studies are confirmed, including the fact that average available parallelism is often above 100, but requires effective measures to resolve control dependencies, as well as memory renaming. We also study how certain compiler artifacts affect the limits of parallelism. In particular we show that the use of a spaghetti stack, as a technique to implicitly rename stack memory and break chains on true dependencies on the stack pointer, can lead to a doubling of potential parallelism.

References

A. W. Appel and Z. Shao. An empirical and analytic study of stack vs. heap cost for languages with closures. Technical Report CS-TR-450-94, 1994.Google Scholar
T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Nineteenth International Symposium on Computer Architecture, pages 342--351, Gold Coast, Australia, 1992. ACM and IEEE Computer Society. Google ScholarDigital Library
F. Bellard. Qemu, a fast and portable dynamic translator. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 41--41, Berkeley, CA, USA, 2005. USENIX Association. Google ScholarDigital Library
B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoe, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 10--1. Springer-Verlag, Berlin/Heidelberg, 1994. Google ScholarDigital Library
H. J. Curnow and B. A. Wichmann. A synthetic benchmark. Computer Journal, 19(1):43--49, 1976.Google ScholarCross Ref
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987. Google ScholarDigital Library
S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, 1996. Google ScholarDigital Library
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC '01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. Google ScholarDigital Library
M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Nineteenth International Symposium on Computer Architecture, pages 46--57, Gold Coast, Australia, 1992. ACM and IEEE Computer Society. Google ScholarDigital Library
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. SIGOPS Oper. Syst. Rev., 30(5):138--147, 1996. Google ScholarDigital Library
G. Ottoni and D. I. August. Global multi-threaded instruction scheduling. In Proceedings of the 40th annual IEEE/ACM International Symposium on Microarchitecture, pages 56--68, 2007. Google ScholarDigital Library
G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
M. A. Postiff, D. A. Greene, G. S. Tyson, and T. N. Mudge. The limits of instruction level parallelism in SPEC95 applications. Computer Architecture News, 217(1):31--34, 1999. Google ScholarDigital Library
J. E. Smith. A study of branch prediction strategies. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pages 202--215, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
D. Stefanović and M. Martonosi. Limits and graph structure of available instruction-level parallelism (research note). Lecture Notes in Computer Science, 1900:1018--1022, 2001. Google ScholarDigital Library
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. SIGARCH Comput. Archit. News, 28(2):1--12, 2000. Google ScholarDigital Library
H. Sutter. A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3):16--20, March 2005.Google Scholar
D. W. Wall. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), volume 26, pages 176--189, New York, NY, 1991. ACM Press. Google ScholarDigital Library
R. P. Weicker. Dhrystone: a synthetic systems programming benchmark. Commun. ACM, 27(10):1013--1030, 1984. Google ScholarDigital Library

Index Terms

Limits of parallelism using dynamic dependency graphs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Dynamic dependency analysis of ordinary programs
ISCA '92: Proceedings of the 19th annual international symposium on Computer architecture

A quantitative analysis of program execution is essential to the computer architecture design process. With the current trend in architecture of enhancing the performance of uniprocessors by exploiting fine-grain parallelism, first-order metrics of ...
Read More
Limits of control flow on parallelism
Special Issue: Proceedings of the 19th annual international symposium on Computer architecture (ISCA '92)

This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: control dependence analysis, executing multiple flows of control simultaneously, and speculative execution. We evaluate these techniques by ...
Read More
Limits to parallelism in scientific computing
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis
July 2009
52 pages
ISBN:9781605586564
DOI:10.1145/2134243
Program Chairs:
Ben Liblit
University of Wisconsin-Madison
,
Andy Podgurski
Case Western Reserve University
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dynamic dependency graph
limits of parallelism
spaghetti stack
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 224
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Limits of parallelism using dynamic dependency graphs

WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic dependency analysis of ordinary programs

Limits of control flow on parallelism

Limits to parallelism in scientific computing