ABSTRACT
A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.
- Government’s open data. http://www.data.org/.Google Scholar
- UCI data sets. http://archive.ics.uci.edu/ml/datasets.Google Scholar
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O’boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization, pages 295–305, 2006. Google ScholarDigital Library
- L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In LCTES’04, pages 231–239, 2004. Google ScholarDigital Library
- J. Ansel, Y. L. W. ans Cy Chan, M. Olszewski, A. Edelman, and S. Amarasinghe. Language and compiler support for auto-tuning variable-accuracy algorithms. In CGO, Chamonix, France, Apr 2011. Google ScholarDigital Library
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A language and compiler for algorithmic choice. In PLDI, Dublin, Ireland, Jun 2009. Google ScholarDigital Library
- J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U. O’Reilly, and S. Amarasinghe. Opentuner: An extensible framework for program autotuning. In Proceedings of The 23rd International Conference on Parallel Architectures and Compilation Techniques, 2014. Google ScholarDigital Library
- J. Ansel, M. Pacula, S. Amarasinghe, and U.-M. O’Reilly. An efficient evolutionary algorithm for solving bottom up problems. In Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland, July 2011. Google ScholarDigital Library
- J. Auslander, M. Philipose, C. Chambers, S. J. Eggers, and B. N. Bershad. Fast, effective dynamic compilation. In PLDI, 1996. Google ScholarDigital Library
- W. Baek and T. Chilimbi. Green: A framework for supporting energyconscious programming using controlled approximation. In PLDI, June 2010. Google ScholarDigital Library
- P. Berube, J. Amaral, R. Ho, and R. Silvera. Workload reduction for multi-input profile-directed optimization. In Proceedings of the IEEE / ACM International Symposium on Code Generation and Optimization, 2009. Google ScholarDigital Library
- V. Bhat, M. Parashar,. Hua Liu, M. Khandekar, N. Kandasamy, and S. Abdelwahed. Enabling self-managing applications using model-based online control strategies. In International Conference on Autonomic Computing, Washington, DC, 2006. Google ScholarDigital Library
- J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM International Conference on Supercomputing, pages 340–347, 1997. Google ScholarDigital Library
- F. Chang and V. Karamcheti. A framework for automatic adaptation of tunable distributed applications. Cluster Computing, 4, March 2001. Google ScholarDigital Library
- Y. Chen, S. Fang, L. Eeckhout, O. Temam, and C. Wu. Iterative optimization for the data center. In ASPLOS, 2012. Google ScholarDigital Library
- Y. Chen, Y. Huang, L. Eeckhout, G. Fursin, L. Peng, O. Temam, and C. Wu. Evaluating iterative optimization across 1000 datasets. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, PLDI’10, pages 448–459, 2010. Google ScholarDigital Library
- P. C. Diniz and M. C. Rinard. Dynamic feedback: an effective technique for adaptive computing. In PLDI, New York, NY, 1997. Google ScholarDigital Library
- S. Fang, Z. Du, Y. Fang, Y. Huang, Y. Chen, L. Eeckhout, O. Temam, H. Li, Y. Chen, and C. Wu. Performance portability across heterogeneous socs using a generalized library-based approach. ACM Transactions on Architecture and Code Optimization, 11, 2014. Google ScholarDigital Library
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216–231, 2005.Google ScholarCross Ref
- M. Frigo and S. G. Johnson. The design and implementation of FFTW3. IEEE, 93(2), February 2005. Invited paper, special issue on “Program Generation, Optimization, and Platform Adaptation”.Google ScholarCross Ref
- G. Fursin, A. Cohen, M. O’Boyle, and O. Temam. Quick and practical run-time evaluation of multiple program optimizations. Transactions on High-Performance Embedded Architectures and Compilers, 4050:34– 53, 2007. Google ScholarDigital Library
- G. Fursin, C. Miranda, O. Temam, M. Namolaru, E. Yom-Tov, A. Zaks, B. Mendelson, E. Bonilla, J. Thomson, H. Leather, C. Williams, M. O’Boyle, P. Barnard, E. Ashton, E. Courtois, and F. Bodin. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers’ Summit, Jul 2008.Google Scholar
- T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2001.Google ScholarCross Ref
- P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. Data representation synthesis. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, 2012. Google ScholarDigital Library
- H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal. Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments. In ICAC, New York, NY, 2010. Google ScholarDigital Library
- H. Hoffmann, S. Misailovic, S. Sidiroglou, A. Agarwal, and M. Rinard. Using code perforation to improve performance, reduce energy consumption, and respond to failures. Technical Report MIT-CSAILTR-2209-042, Massachusetts Institute of Technology, Sep 2009.Google Scholar
- H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Power-aware computing with dynamic knobs. In ASPLOS, 2011.Google ScholarDigital Library
- E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In International Conference on Computational Science, 2001. Google ScholarDigital Library
- E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1):135–158, 2004. Google ScholarDigital Library
- C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: effective selection of data structures. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pages 86–97, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- G. Karsai, A. Ledeczi, J. Sztipanovits, G. Peceli, G. Simon, and T. Kovacshazy. An approach to self-adaptive software based on supervisory control. In International Workshop in Self-adaptive software, 2001. Google ScholarDigital Library
- X. Li, M. J. Garzarán, and D. Padua. Optimizing sorting with genetic algorithms. In CGO, 2005. Google ScholarDigital Library
- Y. Liu, E. Z. Zhang, and X. Shen. A cross-input adaptive framework for gpu programs optimization. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), pages 1–10, 2009. Google ScholarDigital Library
- S. Muralidharan, M. Shantharam, M. Hall, M. Garland, and B. Catanzaro. Nitro: A framework for adaptive code variant tuning. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pages 501–512. IEEE, 2014. Google ScholarDigital Library
- E. Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization, pages 119 –129, April 2011. Google ScholarDigital Library
- M. Puschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, 93(2):232–275, 2005.Google ScholarCross Ref
- M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. R. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. IJHPCA, 18(1), 2004. Google ScholarDigital Library
- J. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986. Google ScholarCross Ref
- M. Samadi, A. Hormati, M. Mehrara, J. Lee, and S. Mahlke. Adaptive input-aware compilation for graphics engines. In Proceedings of ACM SIGPLAN 2012 Conference on Programming Language Design and Implementation, 2012. Google ScholarDigital Library
- C. Tapus, I.-H. Chung, and J. K. Hollingsworth. Active harmony: Towards automated performance tuning. In In Proceedings from the Conference on High Performance Networking and Computing, pages 1–11, 2003.Google Scholar
- N. Thomas, G. Tanase, O. Tkachyshyn, J. Perdue, N. M. Amato, and L. Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277–288, 2005. Google ScholarDigital Library
- K. Tian, Y. Jiang, E. Zhang, and X. Shen. An input-centric paradigm for program dynamic optimizations. In the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2010. Google ScholarDigital Library
- M. Voss and R. Eigenmann. Adapt: Automated de-coupled adaptive program transformation. In International Conference on Parallel Processing, 2000. Google ScholarDigital Library
- M. Voss and R. Eigenmann. High-level adaptive program optimization with adapt. ACM SIGPLAN Notices, 36(7), 2001. Google ScholarDigital Library
- R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Scientific Discovery through Advanced Computing Conference, Journal of Physics: Conference Series, San Francisco, CA, June 2005.Google Scholar
- R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Supercomputing, Washington, DC, 1998. Google ScholarDigital Library
- R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1-2):3–35, 2001.Google ScholarDigital Library
Index Terms
- Autotuning algorithmic choice for input sensitivity
Recommendations
Autotuning algorithmic choice for input sensitivity
PLDI '15A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an ...
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
Highlights- Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.
AbstractGPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model ...
Autotuning OpenACC work distribution via direct search
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced CyberinfrastructureOpenACC provides a high-productivity API for programming GPUs and similar accelerator devices. One of the last steps in tuning OpenACC programs is selecting values for the num_gangs and vector_length clauses, which control how a parallel workload is ...
Comments