skip to main content
10.1145/2134243.2134253acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Limits of parallelism using dynamic dependency graphs

Authors Info & Claims
Published:20 July 2009Publication History

ABSTRACT

The advance of multi-core processors has led to renewed interest in extracting parallelism from programs. It is sometimes useful to know how much parallelism is exploitable in the limit for general programs, to put into perspective the speedups of various parallelisation techniques. Wall's study [19] was one of the first to examine limits of parallelism in detail. We present an extension of Wall's analysis of limits of parallelism, by constructing Dynamic Dependency Graphs from execution traces of a number of benchmark programs, allowing us better visualisation of the types of dependencies which limit parallelism, as well as flexibility in transforming graphs when exploring possible optimisations. Some of the results of Wall and subsequent studies are confirmed, including the fact that average available parallelism is often above 100, but requires effective measures to resolve control dependencies, as well as memory renaming. We also study how certain compiler artifacts affect the limits of parallelism. In particular we show that the use of a spaghetti stack, as a technique to implicitly rename stack memory and break chains on true dependencies on the stack pointer, can lead to a doubling of potential parallelism.

References

  1. A. W. Appel and Z. Shao. An empirical and analytic study of stack vs. heap cost for languages with closures. Technical Report CS-TR-450-94, 1994.Google ScholarGoogle Scholar
  2. T. M. Austin and G. S. Sohi. Dynamic dependency analysis of ordinary programs. In Nineteenth International Symposium on Computer Architecture, pages 342--351, Gold Coast, Australia, 1992. ACM and IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Bellard. Qemu, a fast and portable dynamic translator. In ATEC '05: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 41--41, Berkeley, CA, USA, 2005. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoe, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: The next generation in parallelizing compilers. In Proceedings of the Workshop on Languages and Compilers for Parallel Computing, pages 10--1. Springer-Verlag, Berlin/Heidelberg, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. J. Curnow and B. A. Wichmann. A synthetic benchmark. Computer Journal, 19(1):43--49, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319--349, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. C. Goldstein, K. E. Schauser, and D. E. Culler. Lazy threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 37(1):5--20, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In WWC '01: Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, pages 3--14, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. S. Lam and R. P. Wilson. Limits of control flow on parallelism. In Nineteenth International Symposium on Computer Architecture, pages 46--57, Gold Coast, Australia, 1992. ACM and IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. SIGOPS Oper. Syst. Rev., 30(5):138--147, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Ottoni and D. I. August. Global multi-threaded instruction scheduling. In Proceedings of the 40th annual IEEE/ACM International Symposium on Microarchitecture, pages 56--68, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, pages 105--118, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. A. Postiff, D. A. Greene, G. S. Tyson, and T. N. Mudge. The limits of instruction level parallelism in SPEC95 applications. Computer Architecture News, 217(1):31--34, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. E. Smith. A study of branch prediction strategies. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pages 202--215, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Stefanović and M. Martonosi. Limits and graph structure of available instruction-level parallelism (research note). Lecture Notes in Computer Science, 1900:1018--1022, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. SIGARCH Comput. Archit. News, 28(2):1--12, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Sutter. A fundamental turn toward concurrency in software. Dr. Dobb's Journal, 30(3):16--20, March 2005.Google ScholarGoogle Scholar
  19. D. W. Wall. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), volume 26, pages 176--189, New York, NY, 1991. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. P. Weicker. Dhrystone: a synthetic systems programming benchmark. Commun. ACM, 27(10):1013--1030, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Limits of parallelism using dynamic dependency graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis
      July 2009
      52 pages
      ISBN:9781605586564
      DOI:10.1145/2134243

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 July 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader