skip to main content
10.1145/3126908.3126969acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Performance modeling under resource constraints using deep transfer learning

Published:12 November 2017Publication History

ABSTRACT

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.

References

  1. Allison H. Baker, Robert D. Falgout, Tzanio V. Kolev, and Ulrike Meier Yang. 2011. Multigrid Smoothers for Ultraparallel Computing. SIAM Journal on Scientific Computing 33 (2011), 2864--2887. Issue 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allison H. Baker, Elizabeth R. Jessup, and Thomas Manteuffel. 2006. A Technique for Accelerating the Convergence of Restarted GMRES. SIAM J. Matrix Anal. Appl. 26 (2006), 962--984. Issue 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-learning-based surrogate models for empirical performance tuning. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  4. Prasanna Balaprakash, Ananta Tiwari, Stefan M Wild, Laura Carrington, and Paul D Hovland. 2016. AutoMOMML: Automatic Multi-objective Modeling with Machine Learning. In International Conference on High Performance Computing. Springer, 219--239.Google ScholarGoogle Scholar
  5. J Bergstra, N Pinto, and D Cox. 2012. Machine learning for predictive auto-tuning with boosted regression trees. In Proceedings of Innovative Parallel Computing. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  6. James Bergstra, Daniel Yamins, and David Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. 115--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jiahong K Chen, Ray-Bing Chen, Akihiro Fujii, Reiji Suda, and Weichung Wang. 2017. Surrogate-Assisted Tuning for Computer Experiments with Qualitative and Quantitative Parameters. (2017).Google ScholarGoogle Scholar
  8. Edmond Chow. 2001. Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns. International Journal of High Performance Computing Applications 15 (2001), 56--74. Issue 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Edmond Chow. 2003. An unstructured multigrid method based on geometric smoothness. Numerical Linear Algebra With Applications 10 (2003), 401--421.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Curtis-Maury, A. Shah, F. Blagojevic, D.S. Nikolopoulos, B.R. de Supinski, and M. Schulz. 2008. Prediction models for multi-dimensional power-performance optimization on many cores. In International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hans De Sterck, Ulrike Meier Yang, and Jeffrey J. Heys. 2006. Reducing Complexity in Parallel Algebraic Multigrid Preconditioners. SIAM J. Matrix Anal. Appl. 27 (2006), 1019--1039. Issue 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yufei Ding, Jason Ansel, Kalyan Veeramachaneni, Xipeng Shen, Una-May O'Reilly, and Saman Amarasinghe. 2015. Autotuning algorithmic choice for input sensitivity. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'15). 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. U.S. D.O.E. 2016. Exascale Initiative. http://www.exascaleinitiative.org/pathforward. (2016).Google ScholarGoogle Scholar
  14. Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Federico Ardanaz, Brad Geltz, Asma Al-Rawi, Fuat Keceli, and Kelly and Livingston. 2016. Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration Toward Co-Designed Energy Management Solutions. In 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, 2016. 43--53.Google ScholarGoogle Scholar
  15. Thomas L Falch and Anne C Elster. 2017. Machine learning-based auto-tuning for enhanced performance portability of OpenCL applications. Concurrency and Computation: Practice and Experience 29, 8 (2017).Google ScholarGoogle Scholar
  16. Robert D. Falgout and Ulrike Meier Yang. 2002. HYPRE: A Library of High Performance Preconditioners. In Computational Science-ICCS 2002. Springer, 632--641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT '16). ACM, 179--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance prediction of multigrid-solver configurations. In Software for Exascale Computing. Springer, 69--88.Google ScholarGoogle Scholar
  19. Van Emden Henson and Ulrike Meier Yang. 2002. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41 (2002), 155--177. Issue 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Intel. 2011. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide. (December 2011).Google ScholarGoogle Scholar
  21. AJ Kunen, TS Bailey, and PN Brown. 2015. KRIPKE-A massively parallel transport mini-app. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, Tech. Rep (2015).Google ScholarGoogle Scholar
  22. Aniruddha Marathe, Peter E Bailey, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2015. A Run-Time System for Power-Constrained HPC Applications. In International Supercomputing Conference.Google ScholarGoogle ScholarCross RefCross Ref
  23. Aniruddha Marathe, Hormozd Gahvari, Jae-Seung Yeom, and Abhinav Bhatele. 2016. LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops. 1132--1141.Google ScholarGoogle ScholarCross RefCross Ref
  24. Saurav Muralidharan, Manu Shantharam, Mary Hall, Michael Garland, and Bryan Catanzaro. 2014. Nitro: A Framework for Adaptive Code Variant Tuning. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 501--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 146--160.Google ScholarGoogle Scholar
  26. Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R de Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In Proceedings of the 27th international ACM conference on International conference on supercomputing. ACM, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. James Price and Simon McIntosh-Smith. 2015. Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models. In Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2015 IEEE 9th International Symposium on. IEEE, 211--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Barry Rountree, David K. Lowenthal, Bronis de Supinski, Martin Schulz, and Vincent W. Freeh. 2009. Adagio: Making DVS Practical for Complex HPC Applications. In International Conference on Supercomputing. Yorktown Heights, N.Y., USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting performance portability in search algorithms for autotuning. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1535--1544.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yousef Saad. 1993. A Flexible Inner-Outer Preconditioned GMRES Algorithm. SIAM Journal on Scientific Computing 14 (1993), 461--469. Issue 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, 545--559. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Performance modeling under resource constraints using deep transfer learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
            November 2017
            801 pages
            ISBN:9781450351140
            DOI:10.1145/3126908
            • General Chair:
            • Bernd Mohr,
            • Program Chair:
            • Padma Raghavan

            Copyright © 2017 ACM

            © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 November 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            SC '17 Paper Acceptance Rate61of327submissions,19%Overall Acceptance Rate1,516of6,373submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader