ABSTRACT
Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.
- Allison H. Baker, Robert D. Falgout, Tzanio V. Kolev, and Ulrike Meier Yang. 2011. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing 33 (2011), 2864--2887. Issue 5. Google ScholarDigital Library
- Allison H. Baker, Elizabeth R. Jessup, and Thomas Manteuffel. 2006. A Technique for Accelerating the Convergence of Restarted GMRES. SIAM J. Matrix Anal. Appl. 26 (2006), 962--984. Issue 4. Google ScholarDigital Library
- Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 1--8.Google ScholarCross Ref
- Prasanna Balaprakash, Ananta Tiwari, Stefan M Wild, Laura Carrington, and Paul D Hovland. 2016. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In International Conference on High Performance Computing. Springer, 219--239.Google Scholar
- J Bergstra, N Pinto, and D Cox. 2012. Machine learning for predictive auto-tuning with boosted regression trees. In Proceedings of Innovative Parallel Computing. 1--9.Google ScholarCross Ref
- James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123. Google ScholarDigital Library
- Jiahong K Chen, Ray-Bing Chen, Akihiro Fujii, Reiji Suda, and Weichung Wang. 2017. Surrogate-Assisted Tuning for Computer Experiments with Qualitative and Quantitative Parameters. (2017).Google Scholar
- Edmond Chow. 2001. Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns. International Journal of High Performance Computing Applications 15 (2001), 56--74. Issue 1. Google ScholarDigital Library
- Edmond Chow. 2003. An unstructured multigrid method based on geometric smoothness. Numerical Linear Algebra With Applications 10 (2003), 401--421.Google ScholarCross Ref
- M. Curtis-Maury, A. Shah, F. Blagojevic, D.S. Nikolopoulos, B.R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Hans De Sterck, Ulrike Meier Yang, and Jeffrey J. Heys. 2006. Reducing Complexity in Parallel Algebraic Multigrid Preconditioners. SIAM J. Matrix Anal. Appl. 27 (2006), 1019--1039. Issue 4. Google ScholarDigital Library
- Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15). 379--390. Google ScholarDigital Library
- U.S. D.O.E. 2016. Exascale Initiative. http://www.exascaleinitiative.org/pathforward. (2016).Google Scholar
- Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Federico Ardanaz, Brad Geltz, Asma Al-Rawi, Fuat Keceli, and Kelly and Livingston. 2016. Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration Toward Co-Designed Energy Management Solutions. In 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2016. 43--53.Google Scholar
- Thomas L Falch and Anne C Elster. 2017. Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications. Concurrency and Computation: Practice and Experience 29, 8 (2017).Google Scholar
- Robert D. Falgout and Ulrike Meier Yang. 2002. HYPRE: A Library of High Performance Preconditioners. In Computational Science-ICCS 2002. Springer, 632--641. Google ScholarDigital Library
- Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, 179--191. Google ScholarDigital Library
- Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance prediction of multigrid-solver configurations. In Software for Exascale Computing. Springer, 69--88.Google Scholar
- Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41 (2002), 155--177. Issue 1. Google ScholarDigital Library
- Intel. 2011. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. (December 2011).Google Scholar
- AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).Google Scholar
- Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In International Supercomputing Conference.Google ScholarCross Ref
- Aniruddha Marathe, Hormozd Gahvari, Jae-Seung Yeom, and Abhinav Bhatele. 2016. LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 1132--1141.Google ScholarCross Ref
- Saurav Muralidharan, Manu Shantharam, Mary Hall, Michael Garland, and Bryan Catanzaro. 2014. Nitro: A Framework for Adaptive Code Variant Tuning. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 501--512. Google ScholarDigital Library
- William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 146--160.Google Scholar
- Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 173--182. Google ScholarDigital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarDigital Library
- James Price and Simon McIntosh-Smith. 2015. Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on. IEEE, 211--218. Google ScholarDigital Library
- Barry Rountree, David K. Lowenthal, Bronis de Supinski, Martin Schulz, and Vincent W. Freeh. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In International Conference on Supercomputing. Yorktown Heights, N.Y., USA. Google ScholarDigital Library
- Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting performance portability in search algorithms for autotuning. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1535--1544.Google ScholarCross Ref
- Yousef Saad. 1993. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM Journal on Scientific Computing 14 (1993), 461--469. Issue 2. Google ScholarDigital Library
- Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In Supercomputing. Google ScholarDigital Library
- Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, 545--559. Google ScholarDigital Library
Index Terms
- Performance modeling under resource constraints using deep transfer learning
Recommendations
Deep learning: systematic review, models, challenges, and research directions
AbstractThe current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...
Transfer learning-based deep CNN model for multiple faults detection in SCIM
AbstractDeep learning-based fault detection approach for squirrel cage induction motors (SCIMs) fault detection can provide a reliable solution to the industries. This paper encapsulates the idea of transfer learning-based knowledge transfer approach and ...
Auroral Oval Boundary Modeling Based on Deep Learning Method
IScIDE 2015: Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243Research on the location of the auroral oval is important to understand the coupling processes of the Sun-Earth system. The equatorward boundary and poleward boundary of the auroral oval are significant parameters of the auroral oval location. Thus ...
Comments