ACM Home Page
Please provide us with feedback. Feedback
Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VF
Full text pdf formatPdf (1.22 MB)
Source International Conference on Supercomputing archive
Proceedings of the 4th international conference on Supercomputing table of contents
Amsterdam, The Netherlands
Pages: 82 - 95  
Year of Publication: 1990
ISBN:0-89791-369-8
Also published in ...
Authors
M. J. Daydé  CERFACS, 42 Av. G. Coriolis, 31047 Toulouse Cedex, France and ENSEEIHT-IRIT, 2 rue Camichel 31071 Toulouse Cedex, France
I. S. Duff  CERFACS, 42 Av. G. Coriolis, 31047 Toulouse Cedex, France and CSS Division, Harwell Laboratory, OXON OX11 0RA, England
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 46,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/77726.255142
What is a DOI?

ABSTRACT

We show how to transform the B-spline curve and surface fitting problems into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values on a newly proposed mesh-of-unshuffle network. The derived parallel algorithm allows the curve interpolation through n points to be solved in &Ogr;(log n) time using &THgr;n/log n) processors and allows the surface interpolation through m x n points to be solved in &Ogr;(log m log n) time using &THgr; (mn/(log m log n)) processors. Both interpolation algorithms are cost-optimal for their respective problems. Besides, the surface fitting problem can be even faster solved in &Ogr;(log m + log n) time if &THgr;(mn) processors are used in the network.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bischof, C., Demmel, J., Dongarra J., DuCroz J., Greertbaum, A., Hammarling S., and Soreusen, D. (1988). LAPACK Working Note #5 : Provisional Contents. Report ANL-88-38. Mathematics and Computer Science Division, Argonne National Laboratory.
 
3
Bucher, I. and Jordan, T. (1984). Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in Computer Methods for Partial Differential Equations, E. R. Vichnevetsky and R. Stepleman (Eds), IMACS, 546-550.
 
4
Calahan, D.A. (1986). Block-oriented, localmemory-based linear equation solution on the CRAY-2: uniprocessor algorithms. Proceedings International Conference on Parallel Processing, August 1986. IEEE Computer Society Press, 375-378.
 
5
Carnevali, P., Radicati di Brozolo, G., Robert, Y., and Sguazzero, P. (1987). Efficient FORTRAN implementation of the Gaussian elimination and Householder reduction algorithms on the IBM 3090 vector multiprocessor. Report ICE- 0012, Rome : IBM European Center for Scientific and Engineering Computing, italy.
 
6
Dayd6, M. and Duff, I.S. (1989). Use of Level 3 BLAS in LU factorization ,on the CRAY-2, the ETA 10-P, and the IBM 3090/VF. Int. J. Supercomputers Applications 3(2), 40-70.
 
7
Dayd6, M. and Duff, I.S. (1991)). Use of Level 3 BLAS in LU factorization in a multiprocessing environment on three vector multiprocessors the ALLIANT F~80, the CRAY-2, and the IBM 3090/VF. CERFACS Report (To appear).
 
8
Dekker, E. (I989). Some aspects of the CRAY.2 architecture. Report TR 89/'8, CERFACS.
 
9
Demmel, J.W., Dongarra, J.J., Du Croz, J., Greenbaum, A., Hammarling, S., and Sorensen, D. C. (1987). Prospectus for the development of a linear algebra library for high-performance computers. Report TM-97, Mathematics and Computer Science Division, Argonne National Laboratory.
 
10
Dongarra, J.J. (1988). Performance of various computers using standard linear equations software in a Fortran environment. Report TM 23. Mathematics and Computer Science Division, Argonne National Laboratory.
11
12
 
13
Dongarra, J.J., Du Croz, J., Duff, I.S., and Hammarling, S. (1988b). A set of level 3 Basic Linear Algebra Subprograms: model implementation and test programs. Report AERE R 13298, Harwell ~tboratory. To appear in A CM Tram. Math. Softw.
 
14
Dongarra, J.J., Gustavson, F. G., and Karp, A. (1984). Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Review 26, 91-112.
 
15
 
16
Gallivan, K., jalby, W., Meier, U., and Sameh A. (1988). impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomputers Applications 2( 1),12-48.
 
17
IBM (1986). Engineering and Scientific Subroutine Library. Program Number: 5668-863, IBM.
 
18
Kagstrom B., and Ling, P. (1988). Level 2 and Level 3 BLAS routines for IBM 3090 VF/400 : implementations and experiences. Report UMINF 154.88, University of Umea, Sweden.
19
20
21
 
22
Robert, Y. and Sguazzero, P. (1987). The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. Report ICE-0006, Rome : IBM European Center for Scientific and Engineering Computing, Italy.
 
23
Sheikh, Q., and Liu, J. (1989). Basic Linear Algebra Subprogram Optimization on the CRAY-2 System. Cray Channels, Spring 1989.


Collaborative Colleagues:
M. J. Daydé: colleagues
I. S. Duff: colleagues

Peer to Peer - Readers of this Article have also read: