ACM Home Page
Please provide us with feedback. Feedback
Memory efficient parallel matrix multiplication operation for irregular problems
Full text PdfPdf (546 KB)
Source Conference On Computing Frontiers archive
Proceedings of the 3rd conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Applications I table of contents
Pages: 229 - 240  
Year of Publication: 2006
ISBN:1-59593-302-6
Authors
Manojkumar Krishnan  Pacific Northwest National Laboratory, Richland, WA
Jarek Nieplocha  Pacific Northwest National Laboratory, Richland, WA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 123,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1128022.1128054
What is a DOI?

ABSTRACT

Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applicati RUMMA) [1] to handle irregularly distributed matrices. Our approach relies on a distribution independent algorithm that provides dynamic load balancing by exploiting data locality and achieves performance as good as the traditional approach which relies on temporary arrays with regular distribution, data redistribution, and matrix multiplication for regular matrices to handle the irregular case. The proposed algorithm is memory-efficient because temporary matrices are not needed. This feature is critical for systems like the IBM Blue Gene/L that offer very limited amount of memory per node. The experimental results demonstrate very good performance across the range of matrix distributions and problem sizes motivated by real applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Krishnan, J. Nieplocha, "SRUMMA: A Matrix Multiplication Algorithm Suitable for Clusters and Scalable Shared Memory Systems", Proc. IPDPS'04, 2004.
 
2
 
3
G. C. Fox, S. W. Otto, A. J. G. Hey, "Matrix algorithms on a hypercube I: Matrix multiplication", Parallel Computing, vol. 4, pp. 17--31, 1987.
 
4
 
5
G.H. Golub, C.H Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
 
6
J. Berntsen, Communication efficient matrix multiplication on hypercubes, Parallel Computing, vol. 12, 1989.
 
7
A. Gupta and V. Kumar, "Scalability of Parallel Algorithms for Matrix Multiplication", ICPP'93, 1993.
 
8
C. Lin and L.Snyder, "A matrix product algorithm and its comparative performance on hypercubes", SHPCC '92.
 
9
Q. Luo and J. B. Drake, A Scalable Parallel Strassen's Matrix Multiply Algorithm for Distributed Memory Computers, http://citeseer.nj.nec.com/517382.html
 
10
S. Huss-Lederman, E. M. Jacobson, and A. Tsao, "Comparison of Scalable Parallel Matrix Multiplication Libraries," in Proceedings of the Scalable Parallel Libraries Conference, 1994.
 
11
C. T. Ho, S. L. Johnsson, A. Edelman, Matrix multiplication on hypercubes using full bandwidth and constant storage, Proc. 6 Distributed Memory Computing Conference. 1991.
12
 
13
J. Li, A. Skjellum, and R. D. Falgout, "A Poly-Algorithm for Parallel Dense Matrix Multiplication on Two-Dimensional Process Grid Topologies," Concurrency, Practice and Experience, vol. 9(5), pp. 345--389, 1997.
 
14
E. Dekel, D. Nassimi, S. Sahni, Parallel matrix and graph algorithms, SIAM J. Computing, vol. 10, 1981.
 
15
 
16
J. Choi, J. Dongarra, and D. W. Walker, "PUMMA: Parallel Universal Matrix Multiplication Algorithms on distributed memory concurrent computers," Concurrency: Practice and Experience, vol. 6(7), 1994.
 
17
S. Huss-Lederman, E. Jacobson, A. Tsao, and G. Zhang, "Matrix Multiplication on the Intel Touchstone DELTA", Concurrency: Practice and Experience, vol. 6 (7). Oct 1994.
 
18
 
19
R. van de Geijn, R. and J. Watts, "SUMMA: Scalable Universal Matrix Multiplication Algorithm," Concurrency: Practice and Experience, vol. 9(4), pp. 255--274, April 1997.
 
20
 
21
 
22
 
23
C. Addison and Y. Ren, "OpenMP Issues Arising in the Development of Parallel BLAS and LAPACK libraries", in Proceedings EWOMP'01. 2001.
 
24
G.R. Luecke, W. Lin, "Scalability and Performance of OpenMP and MPI on a 128-Processor SGI Origin 2000", Concurrency and Computation: Practice and Experience, vol. 13, pp 905--928. 2001.
 
25
 
26
T. Betcke, "Performance analysis of various parallelization methods for BLAS3 routines on cluster architectures",John von Neumann-Instituts fur Computing, Tech. Rep. FZJ-ZAM-IB-2000-15, 2000.
 
27
J. L. Träff, H. Ritzdorf, R. Hempel "The Implementation of MPI-2 One-Sided Communication for the NEC SX-5", in Proceedings of Supercomputing, 2000.
28
 
29
J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D.K. Panda," Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters", IEEE CLUSTER'03, 2003.
 
30
A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing, Addison Wesley, 2003.
 
31
Cray Online documentation. Optimizing Applications on the Cray X1TM System. http://www.cray.com/craydoc/20/manuals/S-2315-50/html-S-2315-50/S-2315-50-toc.html
 
32
 
33
ARMCI Web page. http://www.emsl.pnl.gov/docs/parsoft/armci/
 
34
 
35
 
36
 
37
T.H. Dunning, Jr. J. Chem. Phys. 90, 1007 (1989).
 
38
Y. Alexeev, M. Valiev, D. A. Dixon, T. L. Windus, "Ab initio study of catalytic GTP hydrolysis", J. of ACS, '04.
 
39
I. Foster et al. "Toward High-Performance Computational Chemistry: I. Scalable Fock Matrix Construction Algorithms", Journal of computational chemistry, vol. 17, No. 1, 109--123, 1996.
 
40
 
41
 
42
Hyuk-Jae Lee, J.A.B. Fortes, Toward data distribution independent parallel matrix multiplication, IPDPS, 1995.
43
 
44
Banicescu, Ioana and R. Lu, Experiences with Fractiling in N-Body Simulations, HPC Symposium, 1998.
 
45
Kendall et al, "High Performance Computational Chemistry: an Overview of NWChem a Distributed Parallel Application", Computer Phys. Comm., 2000, 128, 260--283.


Collaborative Colleagues:
Manojkumar Krishnan: colleagues
Jarek Nieplocha: colleagues