skip to main content
research-article
Open access

PEAK—a fast and effective performance tuning system via compiler optimization orchestration

Published: 21 May 2008 Publication History

Abstract

Compile-time optimizations generally improve program performance. Nevertheless, degradations caused by individual compiler optimization techniques are to be expected. Feedback-directed optimization orchestration systems generate optimized code versions under a series of optimization combinations, evaluate their performance, and search for the best version. One challenge to such systems is to tune program performance quickly in an exponential search space. Another challenge is to achieve high program performance, considering that optimizations interact. Aiming at these two goals, this article presents an automated performance tuning system, PEAK, which searches for the best compiler optimization combinations for the important code sections in a program. The major contributions made in this work are as follows: (1) An algorithm called Combined Elimination (CE) is developed to explore the optimization space quickly and effectively; (2) Three fast and accurate rating methods are designed to evaluate the performance of an optimized code section based on a partial execution of the program; (3) An algorithm is developed to identify important code sections as candidates for performance tuning by trading off tuning speed and tuned program performance; and (4) A set of compiler tools are implemented to automate optimization orchestration. Orchestrating optimization options in SUN Forte compilers at the whole-program level, our CE algorithm improves performance by 10.8% over the SPEC CPU2000 FP baseline setting, compared to 5.6% improved by manual tuning. Orchestrating GCC O3 optimizations, CE improves performance by 12% over O3, the highest optimization level. Applying the rating methods, PEAK reduces tuning time from 2.19 hours to 5.85 minutes on average, while achieving equal or better program performance.

References

[1]
Adl-Tabatabai, A.-R., Cierniak, M., Lueh, G.-Y., Parikh, V. M., and Stichnoth, J. M. 1998. Fast, effective code generation in a just-in-time java compiler. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. ACM, New York, 280--290.
[2]
Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O'Boyle, M. F. P., Thomson, J., Toussaint, M., and Williams, C. K. I. 2006. Using machine learning to focus iterative optimization. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 295--305.
[3]
Arnold, M., Fink, S., Grove, D., Hind, M., and Sweeney, P. F. 2000. Adaptive optimization in the Jalapeño JVM. In Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, New York, 47--65.
[4]
Arnold, M., Hind, M., and Ryder, B. G. 2002. Online feedback-directed optimization of java. In Proceedings of the 17th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, New York, 111--129.
[5]
Auslander, J., Philipose, M., Chambers, C., Eggers, S. J., and Bershad, B. N. 1996. Fast, effective dynamic compilation. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 149--159.
[6]
Bala, V., Duesterwald, E., and Banerjia, S. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. ACM, New York, 1--12.
[7]
Bruening, D., Garnett, T., and Amarasinghe, S. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Computer Generation and Optimization (CGO '03). IEEE Computer Society Press, Los Alamitos, CA.
[8]
Cavazos, J. and O'Boyle, M. F. P. 2006. Method-specific dynamic compilation using logistic regression. In OOPSLA '06: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM, New York, 229--240.
[9]
Chow, K. and Wu, Y. 1999. Feedback-directed selection and characterization of compiler optimizations. In Proceedings of the 2nd Workshop on Feedback Directed Optimizations. Israel.
[10]
Cierniak, M., Lueh, G.-Y., and Stichnoth, J. M. 2000. Practicing JUDO: Java under dynamic optimizations. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. ACM, New York, 13--26.
[11]
Click, C. and Cooper, K. D. 1995. Combining analyses, combining optimizations. ACM Trans. Prog. Lang. Syst. (TOPLAS) 17, 2, 181--196.
[12]
Consel, C. and Noël, F. 1996. A general approach for run-time specialization and its application to c. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, 145--156.
[13]
Cooper, K. D., Grosul, A., Harvey, T. J., Reeves, S., Subramanian, D., Torczon, L., and Waterman, T. 2005. Acme: adaptive compilation made efficient. In LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 69--77.
[14]
Cooper, K. D., Subramanian, D., and Torczon, L. 2002. Adaptive optimizing compilers for the 21st century. J. Supercomput. 23, 1, 7--22.
[15]
Dang, F., Yu, H., and Rauchwerger, L. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS '02). IEEE Computer Society Press, Los Alamitos, CA.
[16]
Diniz, P. C. and Rinard, M. C. 1997. Dynamic feedback: An effective technique for adaptive computing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 71--84.
[17]
Duesterwald, E. and Bala, V. 2000. Software profiling for hot path prediction: Less is more. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 202--211.
[18]
Ebcioglu, K. and Altman, E. R. 1997. DAISY: Dynamic compilation for 100. In Proceedings of the Annual International Symposium on Computer Architecture. 26--37.
[19]
Engler, D. R. 1996. VCODE: a retargetable, extensible, very fast dynamic code generation system. SIGPLAN Notes 31, 5, 160--170.
[20]
Engler, D. R. and Proebsting, T. A. 1994. DCG: An efficient, retargetable dynamic code generation system. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 263--272.
[21]
GNU. 2005. GCC online documentation. http://gcc.gnu.org/onlinedocs/.
[22]
Graham, S. L., Kessler, P. B., and McKusick, M. K. 1982. GPROF: A call graph execution profiler. In Proceedings of the SIGPLAN Symposium on Compiler Construction. ACM, New York, 120--126.
[23]
Granston, E. D. and Holler, A. 2001. Automatic recommendation of compiler options. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4).
[24]
Grant, B., Philipose, M., Mock, M., Chambers, C., and Eggers, S. J. 1999. An evaluation of staged run-time optimizations in dyc. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation. ACM, New York, 293--304.
[25]
Haneda, M., Knijnenburg, P. M. W., and Wijshoff, H. A. G. 2005. Generating new general compiler optimization settings. In ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing. ACM, New York, 161--168.
[26]
Hedayat, A., Sloane, N., and Stufken, J. 1999. Orthogonal Arrays: Theory and Applications. Springer-Verlag, New York.
[27]
Joshi, R., Nelson, G., and Randall, K. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation. ACM, New York, 304--314.
[28]
Karp, R. 1972. Reducibility among combinatorial problems. In Proceedings of the Symposium on the Complexity of Computer Computations. Plenum Press, New York, 85--103.
[29]
Kistler, T. and Franz, M. 2003. Continuous program optimization: A case study. ACM Trans. Program. Lang. Syst. 25, 4, 500--548.
[30]
Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the IEEE PACT. IEEE Computer Society Press, Los Alamitos, CA, 237--248.
[31]
Kisuki, T., Knijnenburg, P. M. W., O'Boyle, M. F. P., Bodin, F., and Wijshoff, H. A. G. 1999. A feasibility study in iterative compilation. In Proceedings of the International Symposium on High Performance Computing (ISHPC'99). 121--132.
[32]
Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., and Jones, D. 2004. Fast searches for effective optimization phase sequences. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation. ACM, New York, 171--182.
[33]
Kulkarni, P. A., Whalley, D. B., Tyson, G. S., and Davidson, J. W. 2006. Exhaustive optimization phase order space exploration. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 306--318.
[34]
Lau, J., Arnold, M., Hind, M., and Calder, B. 2006. Online performance auditing: Using hot optimizations without getting burned. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York.
[35]
Lee, P. and Leone, M. 1996. Optimizing ML with run-time code generation. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 137--148.
[36]
Leone, M. and Lee, P. 1998. Dynamic specialization in the fabius system. ACM Comput. Surv. 30, 3es, 23.
[37]
Merten, M. C., Trick, A. R., Barnes, R. D., Nystrom, E. M., George, C. N., Gyllenhaal, J. C., and Mei W. Hwu, W. 2001. An architectural framework for runtime optimization. IEEE Trans. Comput. 50, 6, 567--589.
[38]
Merten, M. C., Trick, A. R., George, C. N., Gyllenhaal, J. C., and Hwu, W. W. 1999. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proceedings of the 26th Annual International Symposium on Computer Architecture. IEEE Computer Society Press, Los Alamitos, CA, 136--147.
[39]
Mock, M., Chambers, C., and Eggers, S. J. 2000. CALPA: A tool for automating selective dynamic compilation. In Proceedings of the International Symposium on Microarchitecture. 291--302.
[40]
Nandy, S., Gao, X., and Ferrante, J. 2003. TFP: Time-sensitive, flow-specific profiling at runtime. In Proceedings of the Workshop on Languages and Compiling for Parallel Computing (LCPC).
[41]
Nystrom, E. M., Barnes, R. D., Merten, M. C., and Mei W. Hwu, W. 2001. Code reordering and speculation support for dynamic optimization systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques.
[42]
Pan, Z. and Eigenmann, R. 2004a. Compiler optimization orchestration for peak performance. Tech. Rep. TR-ECE-04-01, School of Electrical and Computer Engineering, Purdue University.
[43]
Pan, Z. and Eigenmann, R. 2004b. Rating compiler optimizations for automatic performance tuning. In SC2004: Proceedings of the High Performance Computing, Networking and Storage Conference. (10 pages).
[44]
Pan, Z. and Eigenmann, R. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the 4th Annual International Symposium on Code Generation and Optimization (CGO). (12 pages).
[45]
Pinkers, R. P. J., Knijnenburg, P. M. W., Haneda, M., and Wijshoff, H. A. G. 2004. Statistical selection of compiler options. In Proceedings of the IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04) (Volendam, The Netherlands). IEEE Computer Society Press, Los Alamitos, CA, 494--501.
[46]
Rauchwerger, L. and Padua, D. A. 1999. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Para. Distrib. Syst. 10, 2 (Feb.), 160--180.
[47]
Rus, S., Rauchwerger, L., and Hoeflinger, J. 2002. Hybrid analysis: Static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing. ACM, New York, 274--284.
[48]
SPEC. 2000. SPEC CPU2000 Results. http://www.spec.org/cpu2000/results.
[49]
Stephenson, M., Amarasinghe, S., Martin, M., and O'Reilly, U.-M. 2003. Meta optimization: improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation. ACM, New York, 77--90.
[50]
Strout, M. M., Carter, L., and Ferrante, J. 2003. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York.
[51]
Sun. 2000. Forte C 6 /Sun WorkShop 6 Compilers C User's Guide. http://docs.sun.com/app/docs/doc/806-3567.
[52]
Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. 204--215.
[53]
Voss, M. and Eigenmann, R. 2000. ADAPT: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society Press, Los Alamitos, CA, 163--170.
[54]
Voss, M. J. and Eigemann, R. 2001. High-level adaptive program optimization with ADAPT. In Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM, New York, 93--102.
[55]
Whaley, R. C. and Dongarra, J. 1998. Automatically tuned linear algebra software. In Proceedings of the SuperComputing 1998: High Performance Networking and Computing.
[56]
Wolf, M. E., Maydan, D. E., and Chen, D.-K. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. ACM, New York, 274--286.

Cited By

View all
  • (2023)A Chisel Framework for Flexible Design Space Exploration through a Functional ApproachACM Transactions on Design Automation of Electronic Systems10.1145/359076928:4(1-31)Online publication date: 5-Apr-2023
  • (2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
  • (2020)CodeSeerProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392741(1-11)Online publication date: 29-Jun-2020
  • Show More Cited By

Index Terms

  1. PEAK—a fast and effective performance tuning system via compiler optimization orchestration

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 30, Issue 3
      May 2008
      245 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/1353445
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 May 2008
      Accepted: 01 May 2007
      Revised: 01 November 2006
      Received: 01 May 2006
      Published in TOPLAS Volume 30, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Performance tuning
      2. dynamic compilation
      3. optimization orchestration

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)93
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A Chisel Framework for Flexible Design Space Exploration through a Functional ApproachACM Transactions on Design Automation of Electronic Systems10.1145/359076928:4(1-31)Online publication date: 5-Apr-2023
      • (2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
      • (2020)CodeSeerProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392741(1-11)Online publication date: 29-Jun-2020
      • (2019)FuncyTunerProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337842(1-10)Online publication date: 5-Aug-2019
      • (2018)SPIRAL: Extreme Performance PortabilityProceedings of the IEEE10.1109/JPROC.2018.2873289106:11(1935-1968)Online publication date: Nov-2018
      • (2016)Clustering-Based Selection for the Exploration of Compiler Optimization SequencesACM Transactions on Architecture and Code Optimization10.1145/288361413:1(1-28)Online publication date: 28-Mar-2016
      • (2016)Tuning compilations landscape2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)10.1109/IC3I.2016.7918029(575-583)Online publication date: Dec-2016
      • (2015)Use of Previously Acquired Positioning of Optimizations for Phase Ordering ExplorationProceedings of the 18th International Workshop on Software and Compilers for Embedded Systems10.1145/2764967.2764978(58-67)Online publication date: 1-Jun-2015
      • (2015)PETRAInternational Journal of Parallel Programming10.1007/s10766-014-0307-843:4(549-571)Online publication date: 1-Aug-2015
      • (2014)Exploration of compiler optimization sequences using clustering-based selectionACM SIGPLAN Notices10.1145/2666357.259782149:5(63-72)Online publication date: 12-Jun-2014
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media