research-article

Algorithm 978: Safe Scaling in the Level 1 BLAS

Author:
Edward Anderson

Lockheed Martin

Lockheed Martin

0000-0002-2180-2438
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 44 Issue 1Article No.: 12pp 1–28https://doi.org/10.1145/3061665

Published:24 July 2017Publication History

ACM Transactions on Mathematical Software

Abstract

The square root of a sum of squares is well known to be prone to overflow and underflow. Ad hoc scaling of intermediate results, as has been done in numerical software such as the BLAS and LAPACK, mostly avoids the problem, but it can still occur at extreme values in the range of representable numbers. More careful scaling, as has been implemented in recent versions of the standard algorithms, may come at the expense of performance or clarity. This work reimplements the vector 2-norm and the generation of Givens rotations from the Level 1 BLAS to improve their performance and design. In addition, support for negative increments is extended to the Level 1 BLAS operations on a single vector, and a comprehensive test suite for all the Level 1 BLAS is included.

Supplemental Material

Available for Download

zip

978.zip (399.1 KB)

Software for Safe Scaling in the Level 1 BLAS

References

E. Anderson. 2002. LAPACK3E—A Fortran 90-enhanced Version of LAPACK. UT-CS-02--497 (LAPACK Working Note 158). University of Tennessee, Knoxville.Google Scholar
E. Anderson and M. Fahey. 1997. Performance Improvements to LAPACK for the Cray Scientific Library. UT-CS-97-359 (LAPACK Working Note 126). University of Tennessee, Knoxville.Google Scholar
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. 1999. LAPACK Users’ Guide (3rd ed.). SIAM, Philadelphia. Google ScholarCross Ref
D. Bindel, J. Demmel, W. Kahan, and O. Marques. 2002, On computing givens rotations reliably and efficiently, ACM Trans. Math. Softw. 28, 2 (Jun. 2002), 206--238. Google ScholarDigital Library
L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Softw. 28, 2 (Jun. 2002), 135--151. Google ScholarDigital Library
J. L. Blue. 1978. A portable fortran program to find the euclidean norm of a vector. ACM Trans. Math. Softw. 4, 1 (Mar. 1978), 15--23. Google ScholarDigital Library
W. J. Cody. 1988. MACHAR: A subroutine to dynamically determine machine parameters. ACM Trans. Math. Softw. 14, 4 (Dec. 1988), 303--311. Google ScholarDigital Library
Cray Research Inc. 1993. Scientific Libraries Reference Manual. SR-2081 8.0.Google Scholar
J. Demmel. 1997. Applied Numerical Linear Algebra. SIAM, Philadelphia. Google ScholarCross Ref
J. Demmel. 2002. Software for Accurate and Efficient Givens Rotations. Retrieved from http://www.cs.berkeley.edu/&sim;demmel/Givens.Google Scholar
J. J. Dongarra. 1980. Fortran BLAS Timing. ANL-80-24 (LINPACK Working Note #3). Argonne National Laboratory, Argonne, IL. Google ScholarCross Ref
J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. 1979. LINPACK Users’ Guide. SIAM, Philadelphia. Google ScholarCross Ref
J. J. Dongarra, J. Du Croz, S. Hammarling, and I. Duff. 1990. A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16, 1 (Mar. 1990), 1--17. Google ScholarDigital Library
J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. 1988. An extended set of fortran basic linear algebra subprograms. ACM Trans. Math. Softw. 14, 1 (Mar. 1988), 1--17. Google ScholarDigital Library
J. J. Dongarra and E. Grosse. 1987. Distribution of mathematical software via electronic mail. Commun. ACM 30, 403--407. Google ScholarDigital Library
P. A. Fox, A. D. Hall, and N. L. Schryer. 1978. Algorithm 528: Framework for a portable library. ACM Trans. Math. Softw. 4, 2 (Jun. 1978), 177--188. Google ScholarDigital Library
R. J. Hanson and T. Hopkins. 2004. Algorithm 830: Another visit with standard and modified givens transformations and a remark on algorithm 539. ACM Trans. Math. Softw. 30, 1 (Mar. 2004), 86--94. Google ScholarDigital Library
N. J. Higham. 1996. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia.Google Scholar
T. Hopkins. 1997. Restructuring the BLAS level 1 routine for computing the modified givens transformation. ACM SIGNUM Newslett. 32, 4 (Oct. 1997), 2--14. Google ScholarDigital Library
IBM. 2012. IBM Engineering and Scientific Subroutine Library (ESSL) Guide and Reference, Version 5.1 Release 1.Google Scholar
IEEE. 2008. IEEE Standard for Floating-Point Arithmetic. Institute for Electrical and Electronics Engineers Inc., New York, NY.Google Scholar
Intel. 2015. Intel Math Kernel Library Reference Manual.Google Scholar
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. 1979. Basic linear algebra subprograms for fortran usage. ACM Trans. Math. Soft. 5, 3 (Sep. 1979), 308--323. Google ScholarDigital Library
D. McGuckin. 2010. N-Dimensional Euclidean Norms. (personal communication).Google Scholar
Q. Sheikh, P. Vu, C. Yang, and M. Merchant. 1992. Implementation of the level 2 and 3 BLAS on the CRAY Y-MP and CRAY-2. J. Supercomput. 5 (1992), 291--305. Google ScholarDigital Library
M. H. Sujon, R. C. Whaley, and Q. Yi. 2013. Vectorization past dependent branches through speculation. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 353--362.Google Scholar
R. C. Whaley, A. Petitet, and J. J. Dongarra. 2001. Automated empirical optimization of software and the ATLAS project. Parallel Comput. 27, 1--2 (2001), 3--35.Google ScholarCross Ref

Index Terms

Algorithm 978: Safe Scaling in the Level 1 BLAS
1. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance
2. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

A numerical solver for general bordered tridiagonal matrix equations

To overcome several limitations of symbolic algorithms introduced recently for matrices of large order, a fast numerical solver is proposed for the matrix linear equation AX=B, where the nn coefficient matrix A is a general nonsingular bordered ...
Read More
Numerical study on incomplete orthogonal factorization preconditioners

We design, analyse and test a class of incomplete orthogonal factorization preconditioners constructed from Givens rotations, incorporating some dropping strategies and updating tricks, for the solution of large sparse systems of linear equations. ...
Read More
On General Row Merging Schemes for Sparse Givens Transformations

This paper introduces general row merging schemes for the $QR$ decomposition of sparse matrices by Givens rotations. They can be viewed as a generalization of row rotations to submatrix rotations (or merging) in the recent method by George and Heath [12]...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Mathematical Software Volume 44, Issue 1
March 2018
308 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3071076
Editor:
Daniel Kressner
EPF Lausanne, Switzerland
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2017
- Revised: 1 March 2017
- Accepted: 1 March 2017
- Received: 1 June 2015
Published in toms Volume 44, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Reusable
Author Tags
Givens rotation
Vector operation
sum of squares
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 295
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithm 978: Safe Scaling in the Level 1 BLAS

ACM Transactions on Mathematical Software

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A numerical solver for general bordered tridiagonal matrix equations

Numerical study on incomplete orthogonal factorization preconditioners

On General Row Merging Schemes for Sparse Givens Transformations