Autotuning algorithmic choice for input sensitivity

Authors:
Yufei Ding

North Carolina State University, USA

North Carolina State University, USA
View Profile

,
Jason Ansel

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Kalyan Veeramachaneni

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Xipeng Shen

North Carolina State University, USA

North Carolina State University, USA
View Profile

,
Una-May O’Reilly

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2015Pages 379–390https://doi.org/10.1145/2737924.2737969

Published:03 June 2015Publication History

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 379–390

ABSTRACT

A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.

References

Government’s open data. http://www.data.org/.Google Scholar
UCI data sets. http://archive.ics.uci.edu/ml/datasets.Google Scholar
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O’boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization, pages 295–305, 2006. Google ScholarDigital Library
L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In LCTES’04, pages 231–239, 2004. Google ScholarDigital Library
J. Ansel, Y. L. W. ans Cy Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In CGO, Chamonix, France, Apr 2011. Google ScholarDigital Library
J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. In PLDI, Dublin, Ireland, Jun 2009. Google ScholarDigital Library
J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U. O’Reilly, and S. Amarasinghe. Opentuner: An extensible framework for program autotuning. In Proceedings of The 23rd International Conference on Parallel Architectures and Compilation Techniques, 2014. Google ScholarDigital Library
J. Ansel, M. Pacula, S. Amarasinghe, and U.-M. O’Reilly. An efficient evolutionary algorithm for solving bottom up problems. In Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland, July 2011. Google ScholarDigital Library
J. Auslander, M. Philipose, C. Chambers, S. J. Eggers, and B. N. Bershad. Fast, effective dynamic compilation. In PLDI, 1996. Google ScholarDigital Library
W. Baek and T. Chilimbi. Green: A framework for supporting energyconscious programming using controlled approximation. In PLDI, June 2010. Google ScholarDigital Library
P. Berube, J. Amaral, R. Ho, and R. Silvera. Workload reduction for multi-input profile-directed optimization. In Proceedings of the IEEE / ACM International Symposium on Code Generation and Optimization, 2009. Google ScholarDigital Library
V. Bhat, M. Parashar,. Hua Liu, M. Khandekar, N. Kandasamy, and S. Abdelwahed. Enabling self-managing applications using model-based online control strategies. In International Conference on Autonomic Computing, Washington, DC, 2006. Google ScholarDigital Library
J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM International Conference on Supercomputing, pages 340–347, 1997. Google ScholarDigital Library
F. Chang and V. Karamcheti. A framework for automatic adaptation of tunable distributed applications. Cluster Computing, 4, March 2001. Google ScholarDigital Library
Y. Chen, S. Fang, L. Eeckhout, O. Temam, and C. Wu. Iterative optimization for the data center. In ASPLOS, 2012. Google ScholarDigital Library
Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng, O. Temam, and C. Wu. Evaluating iterative optimization across 1000 datasets. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, PLDI’10, pages 448–459, 2010. Google ScholarDigital Library
P. C. Diniz and M. C. Rinard. Dynamic feedback: an effective technique for adaptive computing. In PLDI, New York, NY, 1997. Google ScholarDigital Library
S. Fang, Z. Du, Y. Fang, Y. Huang, Y. Chen, L. Eeckhout, O. Temam, H. Li, Y. Chen, and C. Wu. Performance portability across heterogeneous socs using a generalized library-based approach. ACM Transactions on Architecture and Code Optimization, 11, 2014. Google ScholarDigital Library
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216–231, 2005.Google ScholarCross Ref
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. IEEE, 93(2), February 2005. Invited paper, special issue on “Program Generation, Optimization, and Platform Adaptation”.Google ScholarCross Ref
G. Fursin, A. Cohen, M. O’Boyle, and O. Temam. Quick and practical run-time evaluation of multiple program optimizations. Transactions on High-Performance Embedded Architectures and Compilers, 4050:34– 53, 2007. Google ScholarDigital Library
G. Fursin, C. Miranda, O. Temam, M. Namolaru, E. Yom-Tov, A. Zaks, B. Mendelson, E. Bonilla, J. Thomson, H. Leather, C. Williams, M. O’Boyle, P. Barnard, E. Ashton, E. Courtois, and F. Bodin. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers’ Summit, Jul 2008.Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2001.Google ScholarCross Ref
P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. Data representation synthesis. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, 2012. Google ScholarDigital Library
H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In ICAC, New York, NY, 2010. Google ScholarDigital Library
H. Hoffmann, S. Misailovic, S. Sidiroglou, A. Agarwal, and M. Rinard. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAILTR-2209-042, Massachusetts Institute of Technology, Sep 2009.Google Scholar
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Power-aware computing with dynamic knobs. In ASPLOS, 2011.Google ScholarDigital Library
E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In International Conference on Computational Science, 2001. Google ScholarDigital Library
E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1):135–158, 2004. Google ScholarDigital Library
C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pages 86–97, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
G. Karsai, A. Ledeczi, J. Sztipanovits, G. Peceli, G. Simon, and T. Kovacshazy. An approach to self-adaptive software based on supervisory control. In International Workshop in Self-adaptive software, 2001. Google ScholarDigital Library
X. Li, M. J. Garzarán, and D. Padua. Optimizing sorting with genetic algorithms. In CGO, 2005. Google ScholarDigital Library
Y. Liu, E. Z. Zhang, and X. Shen. A cross-input adaptive framework for gpu programs optimization. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), pages 1–10, 2009. Google ScholarDigital Library
S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro. Nitro: A framework for adaptive code variant tuning. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 501–512. IEEE, 2014. Google ScholarDigital Library
E. Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 119 –129, April 2011. Google ScholarDigital Library
M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232–275, 2005.Google ScholarCross Ref
M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. R. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1), 2004. Google ScholarDigital Library
J. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. Google ScholarCross Ref
M. Samadi, A. Hormati, M. Mehrara, J. Lee, and S. Mahlke. Adaptive input-aware compilation for graphics engines. In Proceedings of ACM SIGPLAN 2012 Conference on Programming Language Design and Implementation, 2012. Google ScholarDigital Library
C. Tapus, I.-H. Chung, and J. K. Hollingsworth. Active harmony: Towards automated performance tuning. In In Proceedings from the Conference on High Performance Networking and Computing, pages 1–11, 2003.Google Scholar
N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277–288, 2005. Google ScholarDigital Library
K. Tian, Y. Jiang, E. Zhang, and X. Shen. An input-centric paradigm for program dynamic optimizations. In the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2010. Google ScholarDigital Library
M. Voss and R. Eigenmann. Adapt: Automated de-coupled adaptive program transformation. In International Conference on Parallel Processing, 2000. Google ScholarDigital Library
M. Voss and R. Eigenmann. High-level adaptive program optimization with adapt. ACM SIGPLAN Notices, 36(7), 2001. Google ScholarDigital Library
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Scientific Discovery through Advanced Computing Conference, Journal of Physics: Conference Series, San Francisco, CA, June 2005.Google Scholar
R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Supercomputing, Washington, DC, 1998. Google ScholarDigital Library
R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3–35, 2001.Google ScholarDigital Library

Index Terms

Autotuning algorithmic choice for input sensitivity
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types

Recommendations

Autotuning algorithmic choice for input sensitivity
PLDI '15

A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an ...
Read More
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights
- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
Abstract
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Read More
Autotuning OpenACC work distribution via direct search
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure

OpenACC provides a high-productivity API for programming GPUs and similar accelerator devices. One of the last steps in tuning OpenACC programs is selecting values for the num_gangs and vector_length clauses, which control how a parallel workload is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
General Chair:
David Grove
IBM Research, USA
,
Program Chair:
Steve Blackburn
Australian National University, Australia
ACM SIGPLAN Notices Volume 50, Issue 6
PLDI '15
June 2015
630 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2813885
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Algorithmic Selection
Autotuning
Input Sensitivity
Variable Accuracy
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 74
  Total Citations
  View Citations
- 1,303
  Total Downloads
- Downloads (Last 12 months)195
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Autotuning algorithmic choice for input sensitivity

PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Autotuning algorithmic choice for input sensitivity

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

Autotuning OpenACC work distribution via direct search