skip to main content
10.1145/1654059.1654061acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Sparse matrix factorization on massively parallel computers

Published:14 November 2009Publication History

ABSTRACT

Direct methods for solving sparse systems of linear equations have a high asymptotic computational and memory requirements relative to iterative methods. However, systems arising in some applications, such as structural analysis, can often be too ill-conditioned for iterative solvers to be effective. We cite real applications where this is indeed the case, and using matrices extracted from these applications to conduct experiments on three different massively parallel architectures, show that a well designed sparse factorization algorithm can attain very high levels of performance and scalability. We present strong scalability results for test data from real applications on up to 8,192 cores, along with both analytical and experimental weak scalability results for a model problem on up to 16,384 cores---an unprecedented number for sparse factorization. For the model problem, we also compare experimental results with multiple analytical scaling metrics and distinguish between some commonly used weak scaling methods.

References

  1. Blue Waters: Sustained Petascale Computing. National Center for Supercomputing Applications, University of Illinois, http://www.ncsa.uiuc.edu/Blue Waters.Google ScholarGoogle Scholar
  2. MSC Software Products: Engineering Analysis. MSC Software Corporation, Santa Ana, CA. http://www.mscsoftware.com/products.Google ScholarGoogle Scholar
  3. P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4):886--905, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. R. Amestoy, I. S. Duff, J. Koster, and J. Y. L'Excellent. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications, 23(1):15--41, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. R. Amestoy, I. S. Duff, and J. Y. L'Excellent. Multifrontal parallel distributed symmetric and unsymmetric solvers. Computational Methods in Applied Mechanical Engineering, 184:501--520, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  6. E. Chow. Parallel implementation and practical use of sparse approximate inverse preconditioners with a priori sparsity patterns. International Journal of High Performance Computing Applications, 15(1):56--74, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. J. Dongarra, J. D. Croz, S. Hammarling, and I. S. Duff. A set of level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1--17, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transactions on Mathematical Software, 9(3):302--325, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. D. Falgout and U. M. Yang. hypre, High Performance Preconditioners: Users manual. Technical report, Lawrence Livermore National Laboratory, 2006. Paper appears in ICCS 2002.Google ScholarGoogle Scholar
  10. M. Faverge and P. Ramet. Dynamic scheduling for sparse direct solver on NUMA architectures. In Proceedings of PARA '2008, Trondheim, Norway.Google ScholarGoogle Scholar
  11. J. Fettig, S. Koric, and N. Sobh. A comparison of direct and iterative linear sparse solvers in computational solid mechanics. In International Conference on Preconditioning Techniques for Large Sparse Matrix Problems, Napa, CA, 2003.Google ScholarGoogle Scholar
  12. A. George and J. W.-H. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, NJ, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. George, A. Gupta, and V. Sarin. An empirical analysis of iterative solver performance for SPD systems. Technical Report RC 24737, IBM T. J. Watson Research Center, Yorktown Heights, NY, January 2009.Google ScholarGoogle Scholar
  14. J. R. Gilbert and S. Toledo. An assessment of incomplete-LU preconditioners for nonsymmetric linear systems. Informatica, 24(3):409--425, 2000.Google ScholarGoogle Scholar
  15. A. Gupta. A shared- and distributed-memory parallel sparse direct solver. Applicable Algebra in Engineering, Communication, and Computing, 18(3):263--277, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Gupta. Fast and effective algorithms for graph partitioning and sparse matrix ordering. IBM Journal of Research and Development, 41(1/2):171--183, January/March, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Gupta. WSMP: Watson sparse matrix package (Part-I: Direct solution of symmetric sparse systems). Technical Report RC 21886, IBM T. J. Watson Research Center, Yorktown Heights, NY, November 2000. http://www.cs.umn.edu/~agupta/wsmp.Google ScholarGoogle Scholar
  18. A. Gupta, M. Joshi, and V. Kumar. WSMP: A high-performance serial and parallel symmetric sparse linear solver. In B. Kagstrom, J. J. Dongarra, E. Elmroth, and J. Wasniewski, editors, PARA '98 Workshop on Applied Parallel Computing in Large Scale Scientific and Industrial Problems. Springer Verlag, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. IEEE Transactions on Parallel and Distributed Systems, 8(5):502--520, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. L. Gustafson. Reevaluating Amdahl's law. Communications of the ACM, 31(5):532--533, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. L. Gustafson, G. R. Montry, and R. E. Benner. Development of parallel methods for a 1024-processor hypercube. SIAM Journal on Scientific and Statistical Computing, 9(4):609--638, 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Hénon, P. Ramet, and J. Roman. PaStiX: A high-performance parallel direct solver for sparse symmetric definite systems. Parallel Computing, 28(2):301--321, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. E. Henson and U. M. Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1): 155--177, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Hysom and A. Pothen. A scalable parallel algorithm for incomplete factor preconditioning. SIAM Journal on Scientific Computing, 22(6):2194--2215, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Joshi, A. Gupta, G. Karypis, and V. Kumar. Two-dimensional scalable parallel algorithms for solution of triangular systems. In Proceedings of the 1997 International Conference on High Performance Computing (HiPC), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Karypis and V. Kumar. ParMETIS: Parallel graph partitioning and sparse matrix ordering library. Technical Report TR 97-060, Department of Computer Science, University of Minnesota, 1997.Google ScholarGoogle Scholar
  27. V. Kumar and A. Gupta. Analyzing scalability of parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 22(3):379--391, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Kumar and V. N. Rao. Parallel depth-first search, part II: Analysis. International Journal of Parallel Programming, 16(6):501--519, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Lakner and C. P. Sosa. IBM System Blue Gene Solution: Blue Gene/P Application Development. IBM, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. S. Li and J. W. Demmel. Making sparse Gaussian elimination scalable by static pivoting. In Supercomputing '98 Proceedings, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. S. Li and J. W. Demmel. A scalable sparse direct solver using static pivoting. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.Google ScholarGoogle Scholar
  32. R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal on Numerical Analysis, 16:346--358, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. W.-H. Liu. The role of elimination trees in sparse factorization. SIAM Journal on Matrix Analysis and Applications, 11:134--172, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. W.-H. Liu. The multifrontal method for sparse matrix solution: Theory and practice. SIAM Review, 34(1):82--109, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. W. Ruge and K. Stüben. Multigrid methods. In S. F. McCormick, editor, Frontiers in Applied Mathematics, volume 3, pages 73--130. SIAM, Philadelphia, PA, 1987.Google ScholarGoogle Scholar
  36. Y. Saad. Iterative Methods for Sparse Linear Systems, 2nd edition. SIAM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Schulze. Towards a tighter coupling of bottom-up and top-down sparse matrix ordering methods. Bit Numerical Mathematics, 41(4):800--841, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  38. X.-H. Sun and J. L. Gustafson. Toward a better parallel performance metric. Parallel Computing, 17(2):1093--1109, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X.-H. Sun and L. M. Ni. Another view of parallel speedup. In Supercomputing '90 Proceedings, pages 324--333, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X.-H. Sun and L. M. Ni. Scalable problems and memory-bounded speedup. Journal of Parallel and Distributed Computing, 19(1):27--37, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. X.-H. Sun and D. T. Rover. Scalability of parallel algorithm-machine combinations. IEEE Transactions on Parallel and Distributed Systems, 5(6):599--613, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. P. H. Worley. The effect of time constraints on scaled speedup. SIAM Journal on Scientific and Statistical Computing, 11(5):838--858, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sparse matrix factorization on massively parallel computers

                      Recommendations

                      Reviews

                      James Harold Davenport

                      Gupta, Koric, and George consider direct methods for solving large sparse linear systems. As the authors themselves admit that "direct methods have a high asymptotic computational and memory requirement relative to iterative methods," the question to ask is why. Because, as the authors conclusively demonstrate in Section 3, real-world examples exist-in particular, three-dimensional finite element problems-that either cannot be solved iteratively, no matter how much preconditioning is attempted, or where the iterative solution takes longer. Therefore, they explore the performance of their Watson sparse matrix package against two other direct solvers, on a range of symmetric positive definite matrices, on a range of machines, going up to 16,384 cores. The results are too detailed to describe here, but what I took away is the message that at least in this range, one still gets linear speedup, provided that the problem size is allowed to grow with the number of processors. I have one minor quibble: while Gupta, Koric, and George are careful to talk about cores and nodes in the description of the machines, they too often use the term "CPU," forcing a detailed search to discover what they mean. Online Computing Reviews Service

                      Access critical reviews of Computing literature here

                      Become a reviewer for Computing Reviews.

                      Comments

                      Login options

                      Check if you have access through your login credentials or your institution to get full access on this article.

                      Sign in
                      • Published in

                        cover image ACM Conferences
                        SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
                        November 2009
                        778 pages
                        ISBN:9781605587448
                        DOI:10.1145/1654059

                        Copyright © 2009 ACM

                        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                        Publisher

                        Association for Computing Machinery

                        New York, NY, United States

                        Publication History

                        • Published: 14 November 2009

                        Permissions

                        Request permissions about this article.

                        Request Permissions

                        Check for updates

                        Qualifiers

                        • research-article

                        Acceptance Rates

                        SC '09 Paper Acceptance Rate59of261submissions,23%Overall Acceptance Rate1,516of6,373submissions,24%

                      PDF Format

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader