Abstract
The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly avoids the problem, but it can still occur at extreme values in the range of representable numbers. More careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite for all the Level 1 BLAS is included.
Supplemental Material
Available for Download
Software for Safe Scaling in the Level 1 BLAS
- E. Anderson. 2002. LAPACK3E—A Fortran 90-enhanced Version of LAPACK. UT-CS-02--497 (LAPACK Working Note 158). University of Tennessee, Knoxville.Google Scholar
- E. Anderson and M. Fahey. 1997. Performance Improvements to LAPACK for the Cray Scientific Library. UT-CS-97-359 (LAPACK Working Note 126). University of Tennessee, Knoxville.Google Scholar
- E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. 1999. LAPACK Users’ Guide (3rd ed.). SIAM, Philadelphia. Google ScholarCross Ref
- D. Bindel, J. Demmel, W. Kahan, and O. Marques. 2002, On computing givens rotations reliably and efficiently, ACM Trans. Math. Softw. 28, 2 (Jun. 2002), 206--238. Google ScholarDigital Library
- L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28, 2 (Jun. 2002), 135--151. Google ScholarDigital Library
- J. L. Blue. 1978. A portable fortran program to find the euclidean norm of a vector. ACM Trans. Math. Softw. 4, 1 (Mar. 1978), 15--23. Google ScholarDigital Library
- W. J. Cody. 1988. MACHAR: A subroutine to dynamically determine machine parameters. ACM Trans. Math. Softw. 14, 4 (Dec. 1988), 303--311. Google ScholarDigital Library
- Cray Research Inc. 1993. Scientific Libraries Reference Manual. SR-2081 8.0.Google Scholar
- J. Demmel. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia. Google ScholarCross Ref
- J. Demmel. 2002. Software for Accurate and Efficient Givens Rotations. Retrieved from http://www.cs.berkeley.edu/∼demmel/Givens.Google Scholar
- J. J. Dongarra. 1980. Fortran BLAS Timing. ANL-80-24 (LINPACK Working Note #3). Argonne National Laboratory, Argonne, IL. Google ScholarCross Ref
- J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. 1979. LINPACK Users’ Guide. SIAM, Philadelphia. Google ScholarCross Ref
- J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1 (Mar. 1990), 1--17. Google ScholarDigital Library
- J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. 1988. An extended set of fortran basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar. 1988), 1--17. Google ScholarDigital Library
- J. J. Dongarra and E. Grosse. 1987. Distribution of mathematical software via electronic mail. Commun. ACM 30, 403--407. Google ScholarDigital Library
- P. A. Fox, A. D. Hall, and N. L. Schryer. 1978. Algorithm 528: Framework for a portable library. ACM Trans. Math. Softw. 4, 2 (Jun. 1978), 177--188. Google ScholarDigital Library
- R. J. Hanson and T. Hopkins. 2004. Algorithm 830: Another visit with standard and modified givens transformations and a remark on algorithm 539. ACM Trans. Math. Softw. 30, 1 (Mar. 2004), 86--94. Google ScholarDigital Library
- N. J. Higham. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia.Google Scholar
- T. Hopkins. 1997. Restructuring the BLAS level 1 routine for computing the modified givens transformation. ACM SIGNUM Newslett. 32, 4 (Oct. 1997), 2--14. Google ScholarDigital Library
- IBM. 2012. IBM Engineering and Scientific Subroutine Library (ESSL) Guide and Reference, Version 5.1 Release 1.Google Scholar
- IEEE. 2008. IEEE Standard for Floating-Point Arithmetic. Institute for Electrical and Electronics Engineers Inc., New York, NY.Google Scholar
- Intel. 2015. Intel Math Kernel Library Reference Manual.Google Scholar
- C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Soft. 5, 3 (Sep. 1979), 308--323. Google ScholarDigital Library
- D. McGuckin. 2010. N-Dimensional Euclidean Norms. (personal communication).Google Scholar
- Q. Sheikh, P. Vu, C. Yang, and M. Merchant. 1992. Implementation of the level 2 and 3 BLAS on the CRAY Y-MP and CRAY-2. J. Supercomput. 5 (1992), 291--305. Google ScholarDigital Library
- M. H. Sujon, R. C. Whaley, and Q. Yi. 2013. Vectorization past dependent branches through speculation. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 353--362.Google Scholar
- R. C. Whaley, A. Petitet, and J. J. Dongarra. 2001. Automated empirical optimization of software and the ATLAS project. Parallel Comput. 27, 1--2 (2001), 3--35.Google ScholarCross Ref
Index Terms
- Algorithm 978: Safe Scaling in the Level 1 BLAS
Recommendations
A numerical solver for general bordered tridiagonal matrix equations
To overcome several limitations of symbolic algorithms introduced recently for matrices of large order, a fast numerical solver is proposed for the matrix linear equation AX=B, where the nn coefficient matrix A is a general nonsingular bordered ...
Numerical study on incomplete orthogonal factorization preconditioners
We design, analyse and test a class of incomplete orthogonal factorization preconditioners constructed from Givens rotations, incorporating some dropping strategies and updating tricks, for the solution of large sparse systems of linear equations. ...
On General Row Merging Schemes for Sparse Givens Transformations
This paper introduces general row merging schemes for the $QR$ decomposition of sparse matrices by Givens rotations. They can be viewed as a generalization of row rotations to submatrix rotations (or merging) in the recent method by George and Heath [12]...
Comments