research-article

Performance modeling under resource constraints using deep transfer learning

Authors:
Aniruddha Marathe

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Rushil Anirudh

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Nikhil Jain

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Abhinav Bhatele

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Jayaraman Thiagarajan

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Bhavya Kailkhura

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Jae-Seung Yeom

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Barry Rountree

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

,
Todd Gamblin

Lawrence Livermore National Laboratory

Lawrence Livermore National Laboratory
View Profile

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2017Article No.: 31Pages 1–12https://doi.org/10.1145/3126908.3126969

Published:12 November 2017Publication History

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–12

ABSTRACT

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.

References

Allison H. Baker, Robert D. Falgout, Tzanio V. Kolev, and Ulrike Meier Yang. 2011. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing 33 (2011), 2864--2887. Issue 5. Google ScholarDigital Library
Allison H. Baker, Elizabeth R. Jessup, and Thomas Manteuffel. 2006. A Technique for Accelerating the Convergence of Restarted GMRES. SIAM J. Matrix Anal. Appl. 26 (2006), 962--984. Issue 4. Google ScholarDigital Library
Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 1--8.Google ScholarCross Ref
Prasanna Balaprakash, Ananta Tiwari, Stefan M Wild, Laura Carrington, and Paul D Hovland. 2016. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In International Conference on High Performance Computing. Springer, 219--239.Google Scholar
J Bergstra, N Pinto, and D Cox. 2012. Machine learning for predictive auto-tuning with boosted regression trees. In Proceedings of Innovative Parallel Computing. 1--9.Google ScholarCross Ref
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123. Google ScholarDigital Library
Jiahong K Chen, Ray-Bing Chen, Akihiro Fujii, Reiji Suda, and Weichung Wang. 2017. Surrogate-Assisted Tuning for Computer Experiments with Qualitative and Quantitative Parameters. (2017).Google Scholar
Edmond Chow. 2001. Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns. International Journal of High Performance Computing Applications 15 (2001), 56--74. Issue 1. Google ScholarDigital Library
Edmond Chow. 2003. An unstructured multigrid method based on geometric smoothness. Numerical Linear Algebra With Applications 10 (2003), 401--421.Google ScholarCross Ref
M. Curtis-Maury, A. Shah, F. Blagojevic, D.S. Nikolopoulos, B.R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
Hans De Sterck, Ulrike Meier Yang, and Jeffrey J. Heys. 2006. Reducing Complexity in Parallel Algebraic Multigrid Preconditioners. SIAM J. Matrix Anal. Appl. 27 (2006), 1019--1039. Issue 4. Google ScholarDigital Library
Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15). 379--390. Google ScholarDigital Library
U.S. D.O.E. 2016. Exascale Initiative. http://www.exascaleinitiative.org/pathforward. (2016).Google Scholar
Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Federico Ardanaz, Brad Geltz, Asma Al-Rawi, Fuat Keceli, and Kelly and Livingston. 2016. Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration Toward Co-Designed Energy Management Solutions. In 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2016. 43--53.Google Scholar
Thomas L Falch and Anne C Elster. 2017. Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications. Concurrency and Computation: Practice and Experience 29, 8 (2017).Google Scholar
Robert D. Falgout and Ulrike Meier Yang. 2002. HYPRE: A Library of High Performance Preconditioners. In Computational Science-ICCS 2002. Springer, 632--641. Google ScholarDigital Library
Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, 179--191. Google ScholarDigital Library
Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance prediction of multigrid-solver configurations. In Software for Exascale Computing. Springer, 69--88.Google Scholar
Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41 (2002), 155--177. Issue 1. Google ScholarDigital Library
Intel. 2011. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. (December 2011).Google Scholar
AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).Google Scholar
Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In International Supercomputing Conference.Google ScholarCross Ref
Aniruddha Marathe, Hormozd Gahvari, Jae-Seung Yeom, and Abhinav Bhatele. 2016. LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 1132--1141.Google ScholarCross Ref
Saurav Muralidharan, Manu Shantharam, Mary Hall, Michael Garland, and Bryan Catanzaro. 2014. Nitro: A Framework for Adaptive Code Variant Tuning. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 501--512. Google ScholarDigital Library
William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 146--160.Google Scholar
Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 173--182. Google ScholarDigital Library
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarDigital Library
James Price and Simon McIntosh-Smith. 2015. Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on. IEEE, 211--218. Google ScholarDigital Library
Barry Rountree, David K. Lowenthal, Bronis de Supinski, Martin Schulz, and Vincent W. Freeh. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In International Conference on Supercomputing. Yorktown Heights, N.Y., USA. Google ScholarDigital Library
Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting performance portability in search algorithms for autotuning. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1535--1544.Google ScholarCross Ref
Yousef Saad. 1993. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM Journal on Scientific Computing 14 (1993), 461--469. Issue 2. Google ScholarDigital Library
Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In Supercomputing. Google ScholarDigital Library
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, 545--559. Google ScholarDigital Library

Index Terms

Performance modeling under resource constraints using deep transfer learning

Recommendations

Deep learning: systematic review, models, challenges, and research directions
Abstract
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...
Read More
Transfer learning-based deep CNN model for multiple faults detection in SCIM
Abstract
Deep learning-based fault detection approach for squirrel cage induction motors (SCIMs) fault detection can provide a reliable solution to the industries. This paper encapsulates the idea of transfer learning-based knowledge transfer approach and ...
Read More
Auroral Oval Boundary Modeling Based on Deep Learning Method
IScIDE 2015: Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243

Research on the location of the auroral oval is important to understand the coupling processes of the Sun-Earth system. The equatorward boundary and poleward boundary of the auroral oval are significant parameters of the auroral oval location. Thus ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
ISBN:9781450351140
DOI:10.1145/3126908
General Chair:
Bernd Mohr
Jülich Supercomputing Center, Jülich, Germany
,
Program Chair:
Padma Raghavan
Vanderbilt University, Nashville, TN
Copyright © 2017 ACM
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
parameter selection
performance prediction
transfer learning
Qualifiers
- research-article
Conference

Acceptance Rates
SC '17 Paper Acceptance Rate61of327submissions,19%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 709
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance modeling under resource constraints using deep transfer learning

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep learning: systematic review, models, challenges, and research directions

Transfer learning-based deep CNN model for multiple faults detection in SCIM

Auroral Oval Boundary Modeling Based on Deep Learning Method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Performance modeling under resource constraints using deep transfer learning

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep learning: systematic review, models, challenges, and research directions

Transfer learning-based deep CNN model for multiple faults detection in SCIM

Auroral Oval Boundary Modeling Based on Deep Learning Method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media