ABSTRACT
Direct methods for solving sparse systems of linear equations have a high asymptotic computational and memory requirements relative to iterative methods. However, systems arising in some applications, such as structural analysis, can often be too ill-conditioned for iterative solvers to be effective. We cite real applications where this is indeed the case, and using matrices extracted from these applications to conduct experiments on three different massively parallel architectures, show that a well designed sparse factorization algorithm can attain very high levels of performance and scalability. We present strong scalability results for test data from real applications on up to 8,192 cores, along with both analytical and experimental weak scalability results for a model problem on up to 16,384 cores---an unprecedented number for sparse factorization. For the model problem, we also compare experimental results with multiple analytical scaling metrics and distinguish between some commonly used weak scaling methods.
- Blue Waters: Sustained Petascale Computing. National Center for Supercomputing Applications, University of Illinois, http://www.ncsa.uiuc.edu/Blue Waters.Google Scholar
- MSC Software Products: Engineering Analysis. MSC Software Corporation, Santa Ana, CA. http://www.mscsoftware.com/products.Google Scholar
- P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4):886--905, 1996. Google ScholarDigital Library
- P. R. Amestoy, I. S. Duff, J. Koster, and J. Y. L'Excellent. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications, 23(1):15--41, 2001. Google ScholarDigital Library
- P. R. Amestoy, I. S. Duff, and J. Y. L'Excellent. Multifrontal parallel distributed symmetric and unsymmetric solvers. Computational Methods in Applied Mechanical Engineering, 184:501--520, 2000.Google ScholarCross Ref
- E. Chow. Parallel implementation and practical use of sparse approximate inverse preconditioners with a priori sparsity patterns. International Journal of High Performance Computing Applications, 15(1):56--74, 2001. Google ScholarDigital Library
- J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1--17, 1990. Google ScholarDigital Library
- I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transactions on Mathematical Software, 9(3):302--325, 1983. Google ScholarDigital Library
- R. D. Falgout and U. M. Yang. hypre, High Performance Preconditioners: Users manual. Technical report, Lawrence Livermore National Laboratory, 2006. Paper appears in ICCS 2002.Google Scholar
- M. Faverge and P. Ramet. Dynamic scheduling for sparse direct solver on NUMA architectures. In Proceedings of PARA '2008, Trondheim, Norway.Google Scholar
- J. Fettig, S. Koric, and N. Sobh. A comparison of direct and iterative linear sparse solvers in computational solid mechanics. In International Conference on Preconditioning Techniques for Large Sparse Matrix Problems, Napa, CA, 2003.Google Scholar
- A. George and J. W.-H. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, NJ, 1981. Google ScholarDigital Library
- T. George, A. Gupta, and V. Sarin. An empirical analysis of iterative solver performance for SPD systems. Technical Report RC 24737, IBM T. J. Watson Research Center, Yorktown Heights, NY, January 2009.Google Scholar
- J. R. Gilbert and S. Toledo. An assessment of incomplete-LU preconditioners for nonsymmetric linear systems. Informatica, 24(3):409--425, 2000.Google Scholar
- A. Gupta. A shared- and distributed-memory parallel sparse direct solver. Applicable Algebra in Engineering, Communication, and Computing, 18(3):263--277, 2007. Google ScholarDigital Library
- A. Gupta. Fast and effective algorithms for graph partitioning and sparse matrix ordering. IBM Journal of Research and Development, 41(1/2):171--183, January/March, 1997. Google ScholarDigital Library
- A. Gupta. WSMP: Watson sparse matrix package (Part-I: Direct solution of symmetric sparse systems). Technical Report RC 21886, IBM T. J. Watson Research Center, Yorktown Heights, NY, November 2000. http://www.cs.umn.edu/~agupta/wsmp.Google Scholar
- A. Gupta, M. Joshi, and V. Kumar. WSMP: A high-performance serial and parallel symmetric sparse linear solver. In B. Kagstrom, J. J. Dongarra, E. Elmroth, and J. Wasniewski, editors, PARA '98 Workshop on Applied Parallel Computing in Large Scale Scientific and Industrial Problems. Springer Verlag, 1998. Google ScholarDigital Library
- A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5):502--520, May 1997. Google ScholarDigital Library
- J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532--533, 1988. Google ScholarDigital Library
- J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609--638, 1988.Google ScholarDigital Library
- P. Hénon, P. Ramet, and J. Roman. PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems. Parallel Computing, 28(2):301--321, 2002. Google ScholarDigital Library
- V. E. Henson and U. M. Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1): 155--177, 2002. Google ScholarDigital Library
- D. Hysom and A. Pothen. A scalable parallel algorithm for incomplete factor preconditioning. SIAM Journal on Scientific Computing, 22(6):2194--2215, 2000. Google ScholarDigital Library
- M. Joshi, A. Gupta, G. Karypis, and V. Kumar. Two-dimensional scalable parallel algorithms for solution of triangular systems. In Proceedings of the 1997 International Conference on High Performance Computing (HiPC), 1997. Google ScholarDigital Library
- G. Karypis and V. Kumar. ParMETIS: Parallel graph partitioning and sparse matrix ordering library. Technical Report TR 97-060, Department of Computer Science, University of Minnesota, 1997.Google Scholar
- V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379--391, 1994. Google ScholarDigital Library
- V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501--519, 1987. Google ScholarDigital Library
- G. Lakner and C. P. Sosa. IBM System Blue Gene Solution: Blue Gene/P Application Development. IBM, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html. Google ScholarDigital Library
- X. S. Li and J. W. Demmel. Making sparse Gaussian elimination scalable by static pivoting. In Supercomputing '98 Proceedings, 1998. Google ScholarDigital Library
- X. S. Li and J. W. Demmel. A scalable sparse direct solver using static pivoting. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.Google Scholar
- R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16:346--358, 1979.Google ScholarCross Ref
- J. W.-H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis and Applications, 11:134--172, 1990. Google ScholarDigital Library
- J. W.-H. Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM Review, 34(1):82--109, 1992. Google ScholarDigital Library
- J. W. Ruge and K. Stüben. Multigrid methods. In S. F. McCormick, editor, Frontiers in Applied Mathematics, volume 3, pages 73--130. SIAM, Philadelphia, PA, 1987.Google Scholar
- Y. Saad. Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, 2003. Google ScholarDigital Library
- J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods. Bit Numerical Mathematics, 41(4):800--841, 2001.Google ScholarCross Ref
- X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17(2):1093--1109, 1991.Google ScholarDigital Library
- X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, pages 324--333, 1990. Google ScholarDigital Library
- X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19(1):27--37, 1993. Google ScholarDigital Library
- X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. IEEE Transactions on Parallel and Distributed Systems, 5(6):599--613, 1994. Google ScholarDigital Library
- P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838--858, 1990. Google ScholarDigital Library
Index Terms
- Sparse matrix factorization on massively parallel computers
Recommendations
Highly Scalable Parallel Algorithms for Sparse Matrix Factorization
In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Cray T3D parallel computer. Through our ...
A sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of an SPD matrix
In this paper, a method via sparse-sparse iteration for computing a sparse incomplete factorization of the inverse of a symmetric positive definite matrix is proposed. The resulting factorized sparse approximate inverse is used as a preconditioner for ...
On sparse matrix reordering for parallel factorization
ICS '94: Proceedings of the 8th international conference on SupercomputingTo minimize the amount of computation and storage for parallel sparse factorization, sparse matrices have to be reordered prior to factorization. We show that none of the popular ordering heuristics proposed before, namely, mulitple minimum degree and ...
Comments