research-article

Parallel reduction to hessenberg form with algorithm-based fault tolerance

Authors:
Yulu Jia

University of Tennessee, Knoxville

University of Tennessee, Knoxville
View Profile

,
George Bosilca

University of Tennessee, Knoxville

University of Tennessee, Knoxville
View Profile

,
Piotr Luszczek

University of Tennessee, Knoxville

University of Tennessee, Knoxville
View Profile

,
Jack J. Dongarra

University of Tennessee, Knoxville, Oak Ridge National Laboratory and University of Manchester

University of Tennessee, Knoxville, Oak Ridge National Laboratory and University of Manchester
View Profile

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2013Article No.: 88Pages 1–11https://doi.org/10.1145/2503210.2503249

Published:17 November 2013Publication History

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Pages 1–11

ABSTRACT

This paper studies the resilience of a two-sided factorization and presents a generic algorithm-based approach capable of making two-sided factorizations resilient. We establish the theoretical proof of the correctness and the numerical stability of the approach in the context of a Hessenberg Reduction (HR) and present the scalability and performance results of a practical implementation. Our method is a hybrid algorithm combining an Algorithm Based Fault Tolerance (ABFT) technique with diskless checkpointing to fully protect the data. We protect the trailing and the initial part of the matrix with checksums, and protect finished panels in the panel scope with diskless checkpoints. Compared with the original HR (the ScaLAPACK PDGEHRD routine) our fault-tolerant algorithm introduces very little overhead, and maintains the same level of scalability. We prove that the overhead shows a decreasing trend as the size of the matrix or the size of the process grid increases.

References

E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. SIAM, Philadelphia, PA, Third edition, 1999. Google ScholarDigital Library
M. Berry and M. Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, Philadelphia, Second edition, 2005. Google ScholarDigital Library
C. H. Bischof and C. V. Loan. The WY Representation for Products of Householder Matrices. In Parallel Processing for Scientific Computing, pages 2--13, 1985. Google ScholarDigital Library
L. S. Blackford, J. Choi, A. Cleary, E. D'Azeuedo, J. Demmel, I. Dhillon, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA, 1997. Google ScholarDigital Library
W. Bland, P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra. A checkpoint-on-failure protocol for algorithm-based recovery in standard MPI. In Proceedings of the 18th international conference on Parallel Processing, Euro-Par'12, pages 477--488, 2012. Google ScholarDigital Library
W. Bland, P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra. Extending the scope of the checkpoint-on-failure protocol for forward recovery in standard MPI. Technical Report UT-CS-12-702, University of Tennessee, 2012.Google Scholar
D. Boley, G. H. Golub, S. Makar, N. Saxena, and E. J. McCluskey. Floating point fault tolerance with backward error assertions. IEEE Transactions on Computers, 44(2):302--311, February 1995. Google ScholarDigital Library
G. Bosilca, A. Bouteiller, É. Brunet, F. Cappello, J. Dongarra, A. Guermouche, T. Hérault, Y. Robert, F. Vivien, and D. Zaidouni. Unified Model for Assessing Checkpointing Protocols at Extreme-Scale. Technical Report RR-7950, INRIA, October 2012.Google Scholar
G. Bosilca, R. Delmas, J. Dongarra, and J. Langou. Algorithm-based fault tolerance applied to high performance computing. J. Parallel Distrib. Comput., 69(4):410--416, April 2009. Google ScholarDigital Library
M. Bougeret, H. Casanova, Y. Robert, F. Vivien, and D. Zaidouni. Using Group Replication for Resilience on Exascale Systems. Technical Report RR-7876, INRIA, February 2012.Google Scholar
K. Braman, R. Byers, and R. Mathias. The multishift QR algorithm. ii. aggressive early deflation. SIAM J. Matrix Anal. Appl., 23:948--973, 2002. Google ScholarDigital Library
S. Brin and L. Page. The antaomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 33:107--17, 1998. Also available online at http://infolab.stanford.edu/pub/papers/google.pdf. Google ScholarDigital Library
K. Bryan and T. Leise. The $25,000,000,000 eigenvector. the linear algebra behind google. SIAM Review, 48(3):569--81, 2006. Also avaiable at http://www.rose-hulman.edu/~bryan/google.html. Google ScholarDigital Library
H. Casanova, Y. Robert, F. Vivien, and D. Zaidouni. Combining Process Replication and Checkpointing for Resilience on Exascale Systems. Technical Report RR-7951, INRIA, May 2012.Google Scholar
A. M. Castaldo, R. C. Whaley, and A. T. Chronopoulos. Reducing floating point error in dot product using the superblock family of algorithms. SIAM J. Sci. Comput., 31(2):1156--1174, 2008. Google ScholarDigital Library
Z. Chen. Scalable techniques for fault tolerant high performance computing. PhD thesis, University of Tennessee, Knoxville, TN, USA, 2006. Google ScholarDigital Library
T. Davies, C. Karlsson, H. Liu, C. Ding, and Z. Chen. High Performance Linpack Benchmark: A Fault Tolerant Implementation Without Checkpointing. In Proceedings of the International Conference on Supercomputing, ICS '11, pages 162--171, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
J. Dongarra, P. Beckman, and T. Moore. The international Exascale software project roadmap. Int. J. High Perform. Comput. Appl., 25(1):3--60, February 2011. Google ScholarDigital Library
J. J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK benchmark: Past, present, and future. Concurrency and Computation: Practice and Experience, 15:1--18, 2003.Google ScholarCross Ref
J. J. Dongarra and R. A. van de Geijn. Reduction to condensed form for the eigenvalue problem on distributed memory architectures. Parallel Computing, 18(9):973--982, 1992.Google ScholarCross Ref
P. Du, A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra. Algorithm-based fault tolerance for dense matrix factorizations. SIGPLAN Not., 47(8):225--234, February 2012. Google ScholarDigital Library
J. G. F. Francis. The QR transformation a unitary analogue to the LR transformation. I. Comput. J., 4:265--271, 1961.Google ScholarCross Ref
J. G. F. Francis. The QR transformation II. Comput. J., 4:332--345, 1962.Google ScholarCross Ref
G. H. Golub and C. F. V. Loan. Matrix Computations. The John Hopkins University Press, 4th edition, December 27 2012. ISBN-10: 1421407949, ISBN-13: 978-1421407944.Google Scholar
L. A. B. Gomez, N. Maruyama, F. Cappello, and S. Matsuoka. Distributed Diskless Checkpoint for Large Scale Systems. In Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on, pages 63--72, 2010. Google ScholarDigital Library
R. Granat, B. Kågström, and D. Kressner. A novel parallel QR algorithm for hybrid distributed memory HPC systems. SIAM J. Sci. Comput., 32:2345--2378, 2010. Google ScholarDigital Library
R. Granat, B. Kågström, D. Kressner, and M. Shao. Parallel library software for the multishift QR algorithm with aggressive early deflation. Technical Report UMINF-12.06, Dept. of Computing Science, Umeøa University, Sweden, 2012.Google Scholar
D. Hakkarinen and Z. Chen. Algorithmic Cholesky Factorization Fault Recovery. In Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pages 1--10, April 2010.Google ScholarCross Ref
K.-H. Huang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput., 33(6):518--528, June 1984. Google ScholarDigital Library
R. M. K. Braman, R. Byers. The multishift QR algorithm. I. maintaining well-focused shifts and level 3 performance. SIAM J. Matrix Anal. Appl., 23:929--947, 2002. Google ScholarDigital Library
B. Kågström, D. Kressner, and M. Shao. On aggressive early deflation in parallel variants of the QR algorithm. In PARA 2010, Applied Parallel and Scientific Computing, LNCS, volume 7134, pages 1--10. Springer, 2012. Google ScholarDigital Library
Y. Kim, J. S. Plank, and J. Dongarra. Fault Tolerant Matrix Operations Using Checksum and Reverse Computation. In 6th Symposium on the Frontiers of Massively Parallel Computation, pages 70--77, Annapolis, MD, October 1996. Google ScholarDigital Library
J. Langou, Z. Chen, G. Bosilca, and J. Dongarra. Recovery patterns for iterative methods in a parallel unstable environment. SIAM J. Sci. Comput., 30(1):102--116, November 2007. Google ScholarDigital Library
A. Langville and C. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006. Google ScholarDigital Library
C.-D. Lu. Scalable Diskless Checkpointing for Large Parallel Systems. PhD thesis, University of Illinois, Urbana, Illinois, USA, 2005. Google ScholarDigital Library
F. T. Luk and H. Park. Fault-tolerant matrix triangularizations on systolic arrays. IEEE Trans. Comput., 37(11):1434--1438, November 1988. Google ScholarDigital Library
E. Meneses. Clustering Parallel Applications to Enhance Message Logging Protocol. https://wiki.ncsa.illinois.edu/download/attachments/17630761/INRIA-UIUC-WS4-emenese.pdf?version=1\&modificationDate=1290466786000.Google Scholar
A. Petitet, C. Whaley, J. Dongarra, and A. Cleary. HPL - a Portable Implementation of the High-Performance Linpack Benchmark for Distributed-memory Computers, September 2008. http://www.netlib.org/benchmark/hpl/.Google Scholar
J. S. Plank, K. Li, and M. A. Puening. Diskless checkpointing. IEEE Trans. Parallel Distrib. Syst., 9(10):972--986, October 1998. Google ScholarDigital Library
R. Schreiber and C. V. Loan. A storage efficient WY representation for products of householder transformations. SIAM Journal on Scientific and Statistical Computing, 10, 1989. Google ScholarDigital Library
B. Schroeder and G. A. Gibson. Understanding failures in petascale computers. Journal of Physics: Conference Series, 78, 2007.Google Scholar
G. W. Stewart. Matrix Algorithms, Volume II: Eigensystems. SIAM: Society for Industrial and Applied Mathematics, First edition, August 2001. Google ScholarDigital Library
U. von Luxburg. A tutorial on spectral clustering. Stat. Comput., 17:395--416, 2007. Google ScholarDigital Library
J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Inc., New York, NY, USA, 1988. ISBN-10: 0198534183, ISBN-13: 978-0198534181. Google ScholarDigital Library
J. H. Wilkinson. Rounding Errors in Algebraic Processes. Dover, New York, 1994. Google ScholarDigital Library
E. Yao, R. Wang, M. Chen, G. Tan, and N. Sun. A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 438--448, May 2012. Google ScholarDigital Library

Index Terms

Parallel reduction to hessenberg form with algorithm-based fault tolerance
1. Mathematics of computing
  1. Mathematical software

Recommendations

Orthogonal Hessenberg Reduction and Orthogonal Krylov Subspace Bases

We study necessary and sufficient conditions that a nonsingular matrix A can be B-orthogonally reduced to upper Hessenberg form with small bandwidth. By this we mean the existence of a decomposition AV=VH, where H is upper Hessenberg with few nonzero ...
Read More
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

We present a Hessenberg reduction (HR) algorithm for hybrid systems of homogeneous multicore with GPU accelerators that can exceed 25x the performance of the corresponding LAPACK algorithm running on current homogeneous multicores. This enormous ...
Read More
FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines
HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing

It is well known that soft errors in linear algebra operations can be detected off-line at the end of the computation using algorithm-based fault tolerance (ABFT). However, traditional ABFT usually cannot correct errors in Cholesky, QR, and LU ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ScaLAPACK
algorithm-based fault tolerance
dense linear algebra
hessenberg reduction
parallel numerical libraries
Qualifiers
- research-article
Conference

Acceptance Rates
SC '13 Paper Acceptance Rate91of449submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 145
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel reduction to hessenberg form with algorithm-based fault tolerance

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Orthogonal Hessenberg Reduction and Orthogonal Krylov Subspace Bases

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing

FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines