skip to main content
research-article
Free Access

COBAYN: Compiler Autotuning Framework Using Bayesian Networks

Published:14 June 2016Publication History
Skip Abstract Section

Abstract

The variety of today’s architectures forces programmers to spend a great deal of time porting and tuning application codes across different platforms. Compilers themselves need additional tuning, which has considerable complexity as the standard optimization levels, usually designed for the average case and the specific target architecture, often fail to bring the best results.

This article proposes COBAYN: Compiler autotuning framework using BAYesian Networks, an approach for a compiler autotuning methodology using machine learning to speed up application performance and to reduce the cost of the compiler optimization phases. The proposed framework is based on the application characterization done dynamically by using independent microarchitecture features and Bayesian networks. The article also presents an evaluation based on using static analysis and hybrid feature collection approaches. In addition, the article compares Bayesian networks with respect to several state-of-the-art machine-learning models.

Experiments were carried out on an ARM embedded platform and GCC compiler by considering two benchmark suites with 39 applications. The set of compiler configurations, selected by the model (less than 7% of the search space), demonstrated an application performance speedup of up to 4.6 × on Polybench (1.85 × on average) and 3.1 × on cBench (1.54 × on average) with respect to standard optimization levels. Moreover, the comparison of the proposed technique with (i) random iterative compilation, (ii) machine learning--based iterative compilation, and (iii) noniterative predictive modeling techniques shows, on average, 1.2 × , 1.37 × , and 1.48 × speedup, respectively. Finally, the proposed method demonstrates 4 × and 3 × speedup, respectively, on cBench and Polybench in terms of exploration efficiency given the same quality of the solutions generated by the random iterative compilation model.

References

  1. Felix Agakov, Edwin Bonilla, John Cavazos, Björn Franke, Grigori Fursin, Michael FP O’Boyle, John Thomson, Marc Toussaint, and Christopher KI Williams. 2006. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 295--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A language and compiler for algorithmic choice. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, NY, 38--49. DOI:http://dx.doi.org/10.1145/1542476.1542481 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. Opentuner: An extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, 303--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. 2016. Predictive modeling methodology for compiler phase-ordering. In Proceedings of 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 5th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo, and Cristina Silvano. 2014. A Bayesian network approach for compiler auto-tuning for embedded processors. In IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia’14). IEEE, 90--97.Google ScholarGoogle ScholarCross RefCross Ref
  6. Amir Hossein Ashouri, Vittorio Zaccaria, Sotirios Xydis, Gianluca Palermo, and Cristina Silvano. 2013. A framework for compiler level statistical analysis over customized VLIW architecture. In 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC’13). IEEE, 124--129.Google ScholarGoogle ScholarCross RefCross Ref
  7. Rajendra Bhatia. 2009. Positive Definite Matrices. Princeton University Press, Princeton, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. François Bodin, Toru Kisuki, Peter Knijnenburg, Mike O’Boyle, and Erven Rohou. 1998. Iterative compilation in a non-linear optimisation space. In Workshop on Profile and Feedback-Directed Compilation.Google ScholarGoogle Scholar
  9. Karsten M. Borgwardt and Hans-Peter Kriegel. 2005. In Data Mining, 5th IEEE International Conference on Shortest-path Kernels on Graphs. IEEE, 8--pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F. P. O’Boyle, and Olivier Temam. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). IEEE Computer Society, Washington, DC, 185--197. DOI:http://dx.doi.org/10.1109/CGO.2007.32 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, and Chengyong Wu. 2012. Deconstructing iterative optimization. ACM Transactions on Architecture and Code Optimization 9, 3, 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Keith D. Cooper, Philip J. Schielke, and Devika Subramanian. 1999. Optimizing for reduced code space using genetic algorithms. In ACM SIGPLAN Notices, Vol. 34. ACM, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May OReilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shuangde Fang, Wenwen Xu, Yang Chen, Lieven Eeckhout, Olivier Temam, Yunji Chen, Chengyong Wu, and Xiaobing Feng. 2015. Practical iterative optimization for the data center. ACM Transactions on Architecture and Code Optimization 12, 2, 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Grigori Fursin. 2010. Collective benchmark (cbench), a collection of open-source programs with multiple datasets assembled by the community to enable realistic benchmarking and research on program and architecture optimization.Google ScholarGoogle Scholar
  16. Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, and others. 2011. Milepost gcc: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39, 3, 296--327.Google ScholarGoogle ScholarCross RefCross Ref
  17. Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, and others. 2008. MILEPOST GCC: Machine learning based research compiler. In GCC Summit.Google ScholarGoogle Scholar
  18. Richard L. Gorsuch. 1988. Exploratory factor analysis. In Handbook of Multivariate Experimental Psychology, John R. Nesselroade and Raymond B. Cattell (Eds.). Springer, 231--258.Google ScholarGoogle Scholar
  19. Scott Grauer-Gray, Lifan Xu, Robert Searles, Sudhee Ayalasomayajula, and John Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Innovative Parallel Computing (InPar’12). IEEE, 1--10.Google ScholarGoogle Scholar
  20. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1, 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Heckerman and David M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. In Machine Learning. 20--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Torsten Hoefler and Roberto Belli. 2015. Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kenneth Hoste and Lieven Eeckhout. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3, 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kenneth Hoste and Lieven Eeckhout. 2008. Cole: Compiler optimization level exploration. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhanpeng Jin and A. C. Cheng. 2008. Improve simulation efficiency using statistical benchmark subsetting - An implantbench case study. In 45th ACM/IEEE Design Automation Conference (DAC 2008). 970--973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Henry F. Kaiser. 1958. The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 3, 187--200.Google ScholarGoogle ScholarCross RefCross Ref
  27. Christos Kartsaklis, Oscar Hernandez, Chung-Hsing Hsu, Thomas Ilsche, Wayne Joubert, and Richard L. Graham. 2012. HERCULES: A pattern driven code transformation system. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW’’12). IEEE, 574--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Toru Kisuki, Peter M. W. Knijnenburg, Mike F. P. O’Boyle, François Bodin, and Harry A. G. Wijshoff. 1999. A feasibility study in iterative compilation. In High Performance Computing, Constantine Polychronopoulos and Kazuki Joe Akira Fukuda (Eds.). Springer, 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Toru Kisuki, Peter M. W. Knijnenburg, and Michael F. P. O’Boyle. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT’00), Philadelphia, PA, October 15--19, 2000. 237--248. DOI:http://dx.doi.org/10.1109/PACT.2000.888348 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Prasad A. Kulkarni, David B. Whalley, Gary S. Tyson, and Jack W. Davidson. 2009. Practical exhaustive optimization phase order exploration and evaluation. ACM Transactions on Architecture and Code Optimization 6, 1, Article 1, 36 pages. DOI:http://dx.doi.org/10.1145/1509864.1509865 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sameer Kulkarni and John Cavazos. 2012. Mitigating the compiler optimization phase-ordering problem using machine learning. ACM SIGPLAN Notices 47, 10, 147--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In International Symposium on Code Generation and Optimization (CGO’09). IEEE, 81--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Junghsi Lee and V. John Mathews. 1994. A stability condition for certain bilinear systems. IEEE Transactions on Signal Processing 42, 7, 1871--1873. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). ACM, New York, NY, 190--200. DOI:http://dx.doi.org/10.1145/1065010.1065034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Luiz G. A. Martins, Ricardo Nobre, João M. P. Cardoso, Alexandre C. B. Delbem, and Eduardo Marques. 2016. Clustering-based selection for the exploration of compiler optimization sequences. ACM Transactions on Architecture and Code Optimization 13, 1, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sanyam Mehta and Pen-Chung Yew. 2015. Improving compiler scalability: Optimizing large programs at small price. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kevin P. Murphy. 2001. The Bayes net toolbox for MATLAB. Computing Science and Statistics 33, 2001.Google ScholarGoogle Scholar
  38. Gianluca Palermo, Cristina Silvano, and Vittorio Zaccaria. 2005. Multi-objective design space exploration of embedded systems. Journal of Embedded Computing 1, 3, 305--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. EunJung Park, John Cavazos, and Marco A. Alvarez. 2012. Using graph-based program characterization for predictive modeling. In Proceedings of the 10th International Symposium on Code Generation and Optimization. ACM, 196--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Eunjung Park, John Cavazos, Louis-Noël Pouchet, Cédric Bastoul, Albert Cohen, and P. Sadayappan. 2013. Predictive modeling in a polyhedral optimization space. International Journal of Parallel Programming 41, 5, 704--750.Google ScholarGoogle ScholarCross RefCross Ref
  41. EunJung Park, Christos Kartsaklis, and John Cavazos. 2014. HERCULES: Strong patterns towards more intelligent predictive modeling. In 43rd International Conference on Parallel Processing (ICPP’14). IEEE, 172--181.Google ScholarGoogle ScholarCross RefCross Ref
  42. Louis-Noël Pouchet. 2012. Polybench: The polyhedral benchmark suite. http://www.cs.ucla.edu/∼pouchet/software/polybench/.Google ScholarGoogle Scholar
  43. Suresh Purini and Lakshya Jain. 2013. Finding good optimization sequences covering program space. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R Foundation. 2012. The R Project for Statistical Computing. (2012). http://www.r-project.org/.Google ScholarGoogle Scholar
  45. Roberto Santana, Concha Bielza, Pedro Larranaga, Jose A. Lozano, Carlos Echegoyen, Alexander Mendiburu, Rubén Armananzas, and Siddartha Shakya. 2010. Mateda-2.0: Estimation of distribution algorithms in MATLAB. Journal of Statistical Software 35, 7, 1--30.Google ScholarGoogle ScholarCross RefCross Ref
  46. Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. ACM SIGPLAN Notices 49, 6, 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O’Reilly. 2003. Meta optimization: Improving compiler heuristics with machine learning. In ACM SIGPLAN Notices, Vol. 38. ACM, 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Mirai Tanaka and Kazuhide Nakata. 2014. Positive definite matrix approximation with condition number constraint. Optimization Letters 8, 3, 939--947.Google ScholarGoogle ScholarCross RefCross Ref
  49. Texas Instruments. 2012. PandaBoard. OMAP4430 SoC Dev. Board, Revision A 2.Google ScholarGoogle Scholar
  50. Wai Teng Tang, Ruizhe Zhao, Mian Lu, Yun Liang, Huynh Phung Huyng, Xibai Li, and Rick Siow Mong Goh. 2015. Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi. In IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). IEEE, 136--145. Google ScholarGoogle ScholarCross RefCross Ref
  51. Bruce Thompson. 2002. Statistical, practical, and clinical: How many kinds of significance do counselors need to consider? Journal of Counseling & Development 80, 1, 64--71.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. COBAYN: Compiler Autotuning Framework Using Bayesian Networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 2
          June 2016
          200 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/2952301
          Issue’s Table of Contents

          Copyright © 2016 ACM

          © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2016
          • Accepted: 1 April 2016
          • Revised: 1 March 2016
          • Received: 1 November 2015
          Published in taco Volume 13, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader