ABSTRACT
How do experts navigate the huge space of implementations for a given specification to find an efficient choice with minimal searching? Answer: They use "heuristics" -- rules of thumb that are more street wisdom than scientific fact. We provide a scientific justification for Dense Linear Algebra (DLA) heuristics by showing that only a few decisions (out of many possible) are critical to performance; once these decisions are made, the die is cast and only relatively minor performance improvements are possible. The (implementation x performance) space of DLA is stair-stepped. Each stair is a set of implementations with very similar performance and (surprisingly) share key design decision(s). High-performance stairs align with heuristics that prescribe certain decisions in a particular context. Stairs also tell us how to tailor the search engine of a DLA code generator to reduce the time it needs to find implementations that are as good or better than those crafted by experts.
- S. Amarel. Program synthesis as a theory formation task: Problem representations and solution methods. In Machine Learning: An Artificial Intelligence Approach: Volume II, pages 499--569. Kaufmann, Los Altos, CA, 1986.Google Scholar
- A. Auer, G. Baumgartner, D. Bernholdt, A. Bibireata, V. Choppella, D. C. and X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, S. Lam, Q. Lu, M. Nooijen, R. P. añd J. Ramanujam, P. Sadayappan, and A. Sibiryakov. Automatic code generation for many-body electronic structure methods: The Tensor Contractio n Engine. Molecular Physics, 2005.Google Scholar
- I. D. Baxter. Design Maintenance Systems. CACM, April 1992. Google ScholarDigital Library
- G. Belter, E. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In SC, 2009. Google ScholarDigital Library
- P. Bientinesi et al. Families of algorithms related to the inversion of a symmetric positive definite matrix. ACM Trans. Math. Softw., 35(1):1--22, 2008. Google ScholarDigital Library
- L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users' Guide. SIAM, 1997. Google ScholarDigital Library
- E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience: Research articles. Concurrency and Computation: Practice & Experience, 19(13):1749--1783, Sept. 2007. Google ScholarDigital Library
- J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw., 16(1):1--17, Mar. 1990. Google ScholarDigital Library
- J. Feigenspan, D. S. Batory, and T. L. Riché. Is the derivation of a model easier to understand than the model itself? In ICPC, pages 47--52, 2012.Google ScholarCross Ref
- D. S. Frankel. Model Driven Architecture: Applying MDA to Enterprise Computing. John Wiley & Sons, Inc., 2003. Google ScholarDigital Library
- R. C. Gonçalves, D. Batory, and J. Sobral. ReFlO: An interactive tool for pipe-and-filter domain specification and program generation. submitted, 2013.Google Scholar
- J. Guo, K. Czarnecki, S. Apel, N. Siegmund, and A. Wasowski. Variability-aware performance prediction: A statistical learning approach. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov 2013.Google ScholarDigital Library
- A. Kleppe, J. Warmer, and W. Bast. MDA Explained: The Model-Driven Architecture. Addison-Wesley, Boston, MA, 2003. Google ScholarDigital Library
- R. E. Korf. Artificial intelligence search algorithms. In In Algorithms and Theory of Computation Handbook. CRC Press, 1996.Google Scholar
- G. M. Lohman. Grammar-like functional rules for representing query optimization alternatives. In ACM SIGMOD, 1988. Google ScholarDigital Library
- B. R. Mandelbrot. The Fractal Geometry of Nature. Macmillian, Philadelphia, PA, USA, 1983.Google ScholarCross Ref
- B. Marker, D. Batory, and R. van de Geijn. A case study in mechanically deriving dense linear algebra code. International Journal of High Performance Computing Applications, 27(4):439--452, 2013. Google ScholarDigital Library
- B. Marker, D. Batory, and R. A. van de Geijn. Code generation and optimization of distributed-memory dense linear algebra kernels. In ICCS, 2013.Google ScholarCross Ref
- B. Marker, J. Poulson, D. Batory, and R. A. van de Geijn. Designing linear algebra algorithms by transformation: Mechanizing the expert developer. In VECPAR, 2012.Google Scholar
- T. Meng Low, B. Marker, and R. van de Geijn. FLAME Working Note #64. Theory and practice of fusing loops when optimizing parallel dense linear algebra operations. Technical Report TR-12-18, The University of Texas at Austin, Department of Computer Sciences, 2012.Google Scholar
- T. Menzies, D. Owen, and J. Richardson. The strangest thing about software. Computer, 40(1):54--60, January 2007. Google ScholarDigital Library
- S. Nedunuri, D. R. Smith, and W. R. Cook. Synthesis of greedy algorithms using dominance relations. In NASA Formal Methods, pages 97--108, 2010.Google Scholar
- S. Nedunuri, D. R. Smith, and W. R. Cook. Theory and techniques for synthesizing efficient breadth-first search algorithms. In FM, pages 308--325, 2012.Google ScholarCross Ref
- J. Poulson, B. Marker, R. A. van de Geijn, J. R. Hammond, and N. A. Romero. Elemental: A new framework for distributed memory dense matrix computations. ACM Trans. Math. Softw., 39(2):13:1--13:24, 2013. Google ScholarDigital Library
- J. Poulson, R. van de Geijn, and J. Bennighof. (Parallel) algorithms for reducing the generalized hermitian-definite eigenvalue problem. ACM Trans. on Math. Softw., 2012. submitted.Google Scholar
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarCross Ref
- T. Riché, R. Goncalves, B. Marker, and D. Batory. Pushouts in Software Architecture Design. In GPCE, 2012. Google ScholarDigital Library
- N. Siegmund, S. S. Kolesnikov, C. Kästner, S. Apel, D. Batory, M. Rosenmüller, and G. Saake. Predicting performance via automated feature-interaction detection. In Proceedings of the 34th International Conference on Software Engineering, ICSE '12, pages 167--177, Piscataway, NJ, USA, 2012. IEEE Press. Google ScholarDigital Library
- M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The Complete Reference. The MIT Press, 1996. Google ScholarDigital Library
- R. A. van de Geijn and E. S. Quintana-Ortí. The Science of Programming Matrix Computations. www.lulu.com, 2008.Google Scholar
- F. Van Zee and R. van de Geijn. BLIS: A framework for rapid instantiation of blas functionality. ACM Trans. Math. Softw. accepted.Google Scholar
- R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proceedings of SC'98, 1998. Google ScholarDigital Library
- R. Williams, C. P. Gomes, and B. Selman. Backdoors to typical case complexity. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI'03, pages 1173--1178, San Francisco, CA, USA, 2003. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
Index Terms
- Understanding performance stairs: elucidating heuristics
Recommendations
Multi-threading and one-sided communication in parallel LU factorization
SC '07: Proceedings of the 2007 ACM/IEEE conference on SupercomputingDense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem ...
Ascending and descending stairs for a biped robot
This paper synthesizes an efficient walking pattern for a practical biped robot when ascending and descending stairs. The main features of the biped robot include variable length legs and a translatable balance weight in the body. The biped robot's walk ...
Interactive gait rehabilitation system with a locomotion interface for training patients to climb stairs
This paper describes the development of a gait rehabilitation system with a locomotion interface (LI) for training patients to climb stairs. The LI consists of two 2-DOF manipulators equipped with footpads. These can move the patient's feet while his or ...
Comments