skip to main content
10.1145/2642937.2642975acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article
Free Access

Understanding performance stairs: elucidating heuristics

Published:15 September 2014Publication History

ABSTRACT

How do experts navigate the huge space of implementations for a given specification to find an efficient choice with minimal searching? Answer: They use "heuristics" -- rules of thumb that are more street wisdom than scientific fact. We provide a scientific justification for Dense Linear Algebra (DLA) heuristics by showing that only a few decisions (out of many possible) are critical to performance; once these decisions are made, the die is cast and only relatively minor performance improvements are possible. The (implementation x performance) space of DLA is stair-stepped. Each stair is a set of implementations with very similar performance and (surprisingly) share key design decision(s). High-performance stairs align with heuristics that prescribe certain decisions in a particular context. Stairs also tell us how to tailor the search engine of a DLA code generator to reduce the time it needs to find implementations that are as good or better than those crafted by experts.

References

  1. S. Amarel. Program synthesis as a theory formation task: Problem representations and solution methods. In Machine Learning: An Artificial Intelligence Approach: Volume II, pages 499--569. Kaufmann, Los Altos, CA, 1986.Google ScholarGoogle Scholar
  2. A. Auer, G. Baumgartner, D. Bernholdt, A. Bibireata, V. Choppella, D. C. and X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, S. Lam, Q. Lu, M. Nooijen, R. P. añd J. Ramanujam, P. Sadayappan, and A. Sibiryakov. Automatic code generation for many-body electronic structure methods: The Tensor Contractio n Engine. Molecular Physics, 2005.Google ScholarGoogle Scholar
  3. I. D. Baxter. Design Maintenance Systems. CACM, April 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Belter, E. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In SC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bientinesi et al. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 35(1):1--22, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users' Guide. SIAM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience: Research articles. Concurrency and Computation: Practice & Experience, 19(13):1749--1783, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw., 16(1):1--17, Mar. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Feigenspan, D. S. Batory, and T. L. Riché. Is the derivation of a model easier to understand than the model itself? In ICPC, pages 47--52, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  10. D. S. Frankel. Model Driven Architecture: Applying MDA to Enterprise Computing. John Wiley & Sons, Inc., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. C. Gonçalves, D. Batory, and J. Sobral. ReFlO: An interactive tool for pipe-and-filter domain specification and program generation. submitted, 2013.Google ScholarGoogle Scholar
  12. J. Guo, K. Czarnecki, S. Apel, N. Siegmund, and A. Wasowski. Variability-aware performance prediction: A statistical learning approach. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kleppe, J. Warmer, and W. Bast. MDA Explained: The Model-Driven Architecture. Addison-Wesley, Boston, MA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. E. Korf. Artificial intelligence search algorithms. In In Algorithms and Theory of Computation Handbook. CRC Press, 1996.Google ScholarGoogle Scholar
  15. G. M. Lohman. Grammar-like functional rules for representing query optimization alternatives. In ACM SIGMOD, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. R. Mandelbrot. The Fractal Geometry of Nature. Macmillian, Philadelphia, PA, USA, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. Marker, D. Batory, and R. van de Geijn. A case study in mechanically deriving dense linear algebra code. International Journal of High Performance Computing Applications, 27(4):439--452, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Marker, D. Batory, and R. A. van de Geijn. Code generation and optimization of distributed-memory dense linear algebra kernels. In ICCS, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  19. B. Marker, J. Poulson, D. Batory, and R. A. van de Geijn. Designing linear algebra algorithms by transformation: Mechanizing the expert developer. In VECPAR, 2012.Google ScholarGoogle Scholar
  20. T. Meng Low, B. Marker, and R. van de Geijn. FLAME Working Note #64. Theory and practice of fusing loops when optimizing parallel dense linear algebra operations. Technical Report TR-12-18, The University of Texas at Austin, Department of Computer Sciences, 2012.Google ScholarGoogle Scholar
  21. T. Menzies, D. Owen, and J. Richardson. The strangest thing about software. Computer, 40(1):54--60, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Nedunuri, D. R. Smith, and W. R. Cook. Synthesis of greedy algorithms using dominance relations. In NASA Formal Methods, pages 97--108, 2010.Google ScholarGoogle Scholar
  23. S. Nedunuri, D. R. Smith, and W. R. Cook. Theory and techniques for synthesizing efficient breadth-first search algorithms. In FM, pages 308--325, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero. Elemental: A new framework for distributed memory dense matrix computations. ACM Trans. Math. Softw., 39(2):13:1--13:24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Poulson, R. van de Geijn, and J. Bennighof. (Parallel) algorithms for reducing the generalized hermitian-definite eigenvalue problem. ACM Trans. on Math. Softw., 2012. submitted.Google ScholarGoogle Scholar
  26. M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  27. T. Riché, R. Goncalves, B. Marker, and D. Batory. Pushouts in Software Architecture Design. In GPCE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Siegmund, S. S. Kolesnikov, C. Kästner, S. Apel, D. Batory, M. Rosenmüller, and G. Saake. Predicting performance via automated feature-interaction detection. In Proceedings of the 34th International Conference on Software Engineering, ICSE '12, pages 167--177, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete Reference. The MIT Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. A. van de Geijn and E. S. Quintana-Ortí. The Science of Programming Matrix Computations. www.lulu.com, 2008.Google ScholarGoogle Scholar
  31. F. Van Zee and R. van de Geijn. BLIS: A framework for rapid instantiation of blas functionality. ACM Trans. Math. Softw. accepted.Google ScholarGoogle Scholar
  32. R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of SC'98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Williams, C. P. Gomes, and B. Selman. Backdoors to typical case complexity. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI'03, pages 1173--1178, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Understanding performance stairs: elucidating heuristics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering
        September 2014
        934 pages
        ISBN:9781450330138
        DOI:10.1145/2642937

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 September 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASE '14 Paper Acceptance Rate82of337submissions,24%Overall Acceptance Rate82of337submissions,24%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader