research-article

Sparse matrix factorization on massively parallel computers

Authors:
Anshul Gupta

IBM Watson Research Center, Yorktown Heights, NY

IBM Watson Research Center, Yorktown Heights, NY
View Profile

,
Seid Koric

University of Illinois, Urbana, IL

University of Illinois, Urbana, IL
View Profile

,
Thomas George

Texas A & M University, College Station, TX

Texas A & M University, College Station, TX
View Profile

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and AnalysisNovember 2009Article No.: 1Pages 1–12https://doi.org/10.1145/1654059.1654061

Published:14 November 2009Publication History

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Pages 1–12

ABSTRACT

Direct methods for solving sparse systems of linear equations have a high asymptotic computational and memory requirements relative to iterative methods. However, systems arising in some applications, such as structural analysis, can often be too ill-conditioned for iterative solvers to be effective. We cite real applications where this is indeed the case, and using matrices extracted from these applications to conduct experiments on three different massively parallel architectures, show that a well designed sparse factorization algorithm can attain very high levels of performance and scalability. We present strong scalability results for test data from real applications on up to 8,192 cores, along with both analytical and experimental weak scalability results for a model problem on up to 16,384 cores---an unprecedented number for sparse factorization. For the model problem, we also compare experimental results with multiple analytical scaling metrics and distinguish between some commonly used weak scaling methods.

References

Blue Waters: Sustained Petascale Computing. National Center for Supercomputing Applications, University of Illinois, http://www.ncsa.uiuc.edu/Blue Waters.Google Scholar
MSC Software Products: Engineering Analysis. MSC Software Corporation, Santa Ana, CA. http://www.mscsoftware.com/products.Google Scholar
P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4):886--905, 1996. Google ScholarDigital Library
P. R. Amestoy, I. S. Duff, J. Koster, and J. Y. L'Excellent. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications, 23(1):15--41, 2001. Google ScholarDigital Library
P. R. Amestoy, I. S. Duff, and J. Y. L'Excellent. Multifrontal parallel distributed symmetric and unsymmetric solvers. Computational Methods in Applied Mechanical Engineering, 184:501--520, 2000.Google ScholarCross Ref
E. Chow. Parallel implementation and practical use of sparse approximate inverse preconditioners with a priori sparsity patterns. International Journal of High Performance Computing Applications, 15(1):56--74, 2001. Google ScholarDigital Library
J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1--17, 1990. Google ScholarDigital Library
I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transactions on Mathematical Software, 9(3):302--325, 1983. Google ScholarDigital Library
R. D. Falgout and U. M. Yang. hypre, High Performance Preconditioners: Users manual. Technical report, Lawrence Livermore National Laboratory, 2006. Paper appears in ICCS 2002.Google Scholar
M. Faverge and P. Ramet. Dynamic scheduling for sparse direct solver on NUMA architectures. In Proceedings of PARA '2008, Trondheim, Norway.Google Scholar
J. Fettig, S. Koric, and N. Sobh. A comparison of direct and iterative linear sparse solvers in computational solid mechanics. In International Conference on Preconditioning Techniques for Large Sparse Matrix Problems, Napa, CA, 2003.Google Scholar
A. George and J. W.-H. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, NJ, 1981. Google ScholarDigital Library
T. George, A. Gupta, and V. Sarin. An empirical analysis of iterative solver performance for SPD systems. Technical Report RC 24737, IBM T. J. Watson Research Center, Yorktown Heights, NY, January 2009.Google Scholar
J. R. Gilbert and S. Toledo. An assessment of incomplete-LU preconditioners for nonsymmetric linear systems. Informatica, 24(3):409--425, 2000.Google Scholar
A. Gupta. A shared- and distributed-memory parallel sparse direct solver. Applicable Algebra in Engineering, Communication, and Computing, 18(3):263--277, 2007. Google ScholarDigital Library
A. Gupta. Fast and effective algorithms for graph partitioning and sparse matrix ordering. IBM Journal of Research and Development, 41(1/2):171--183, January/March, 1997. Google ScholarDigital Library
A. Gupta. WSMP: Watson sparse matrix package (Part-I: Direct solution of symmetric sparse systems). Technical Report RC 21886, IBM T. J. Watson Research Center, Yorktown Heights, NY, November 2000. http://www.cs.umn.edu/~agupta/wsmp.Google Scholar
A. Gupta, M. Joshi, and V. Kumar. WSMP: A high-performance serial and parallel symmetric sparse linear solver. In B. Kagstrom, J. J. Dongarra, E. Elmroth, and J. Wasniewski, editors, PARA '98 Workshop on Applied Parallel Computing in Large Scale Scientific and Industrial Problems. Springer Verlag, 1998. Google ScholarDigital Library
A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5):502--520, May 1997. Google ScholarDigital Library
J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532--533, 1988. Google ScholarDigital Library
J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609--638, 1988.Google ScholarDigital Library
P. Hénon, P. Ramet, and J. Roman. PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems. Parallel Computing, 28(2):301--321, 2002. Google ScholarDigital Library
V. E. Henson and U. M. Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1): 155--177, 2002. Google ScholarDigital Library
D. Hysom and A. Pothen. A scalable parallel algorithm for incomplete factor preconditioning. SIAM Journal on Scientific Computing, 22(6):2194--2215, 2000. Google ScholarDigital Library
M. Joshi, A. Gupta, G. Karypis, and V. Kumar. Two-dimensional scalable parallel algorithms for solution of triangular systems. In Proceedings of the 1997 International Conference on High Performance Computing (HiPC), 1997. Google ScholarDigital Library
G. Karypis and V. Kumar. ParMETIS: Parallel graph partitioning and sparse matrix ordering library. Technical Report TR 97-060, Department of Computer Science, University of Minnesota, 1997.Google Scholar
V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379--391, 1994. Google ScholarDigital Library
V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501--519, 1987. Google ScholarDigital Library
G. Lakner and C. P. Sosa. IBM System Blue Gene Solution: Blue Gene/P Application Development. IBM, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html. Google ScholarDigital Library
X. S. Li and J. W. Demmel. Making sparse Gaussian elimination scalable by static pivoting. In Supercomputing '98 Proceedings, 1998. Google ScholarDigital Library
X. S. Li and J. W. Demmel. A scalable sparse direct solver using static pivoting. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.Google Scholar
R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16:346--358, 1979.Google ScholarCross Ref
J. W.-H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis and Applications, 11:134--172, 1990. Google ScholarDigital Library
J. W.-H. Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM Review, 34(1):82--109, 1992. Google ScholarDigital Library
J. W. Ruge and K. Stüben. Multigrid methods. In S. F. McCormick, editor, Frontiers in Applied Mathematics, volume 3, pages 73--130. SIAM, Philadelphia, PA, 1987.Google Scholar
Y. Saad. Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, 2003. Google ScholarDigital Library
J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods. Bit Numerical Mathematics, 41(4):800--841, 2001.Google ScholarCross Ref
X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17(2):1093--1109, 1991.Google ScholarDigital Library
X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, pages 324--333, 1990. Google ScholarDigital Library
X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19(1):27--37, 1993. Google ScholarDigital Library
X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. IEEE Transactions on Parallel and Distributed Systems, 5(6):599--613, 1994. Google ScholarDigital Library
P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838--858, 1990. Google ScholarDigital Library

Index Terms

Recommendations

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Cray T3D parallel computer. Through our ...
Read More
A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

In this paper, a method via sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of a symmetric positive definite matrix is proposed. The resulting factorized sparse approximate inverse is used as a preconditioner for ...
Read More
On sparse matrix reordering for parallel factorization
ICS '94: Proceedings of the 8th international conference on Supercomputing

To minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and ...
Read More

Reviews

Reviewer: James Harold Davenport

Gupta, Koric, and George consider direct methods for solving large sparse linear systems. As the authors themselves admit that "direct methods have a high asymptotic computational and memory requirement relative to iterative methods," the question to ask is why. Because, as the authors conclusively demonstrate in Section 3, real-world examples exist-in particular, three-dimensional finite element problems-that either cannot be solved iteratively, no matter how much preconditioning is attempted, or where the iterative solution takes longer. Therefore, they explore the performance of their Watson sparse matrix package against two other direct solvers, on a range of symmetric positive definite matrices, on a range of machines, going up to 16,384 cores. The results are too detailed to describe here, but what I took away is the message that at least in this range, one still gets linear speedup, provided that the problem size is allowed to grow with the number of processors. I have one minor quibble: while Gupta, Koric, and George are careful to talk about cores and nodes in the description of the machines, they too often use the term "CPU," forcing a detailed search to discover what they mean. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
November 2009
778 pages
ISBN:9781605587448
DOI:10.1145/1654059
Conference Chair:
Wilfred Pinfold
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 November 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
error correction
last-level caches
reliability
soft error
Qualifiers
- research-article
Conference

Acceptance Rates
SC '09 Paper Acceptance Rate59of261submissions,23%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 46
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sparse matrix factorization on massively parallel computers

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

On sparse matrix reordering for parallel factorization

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sparse matrix factorization on massively parallel computers

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix

On sparse matrix reordering for parallel factorization

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media