ABSTRACT
Conventional algorithms for computing large one-dimensional fast Fourier transforms (FFTs), even those algorithms recently developed for vector and parallel computers, are largely unsuitable for systems with external or hierarchical memory. The principal reason for this is the fact that most FFT algorithms require at least m complete passes through the data set to compute a 2m-point FFT.
This paper describes some advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) employ strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation.
Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly two gigaflops.
- 1.Agarwal, R. C., and Cooley, J. W., "Fourier Transform and Convolution Subroutines for the IBM 3090 Vector Facility",/BM Journa/ o{Research and Development, vol. 30 (1986), p. 145- 162. Google ScholarDigital Library
- 2.Armstrong, J., "A Multi-Algoritlvm Approach to Very High Performance 1D FFTs", Journa/ of Supercomputing, to appear.Google Scholar
- 3.Bailey, D. H., "A High-Performance Fast Fourier Transform Algorithm for the Cray-2", Journal of Supercomputing, vol. 1 (1987), p. 43- 60.Google ScholarCross Ref
- 4.Bailey, D. H., "A High.Performance FFT Algorithm for Vector Sup~computers",/nternational Journal o~ Super~omputer Applications vol. 2 (1988), p. 82- 87.Google ScholarDigital Library
- 5.Fraser, D., "Array Per~mtation by Index-Digit Permutation", Journa/of" the Association for Computing Machinery, vol. 23 (1976), p. 298 309. Google ScholarDigital Library
- 6.Gentleman, W. M., and Sande, G., "Fast Fourier Transforms - For Fun and Profit", AFIPS Proceedings, v,d. 29 (1966), p. 563 578.Google Scholar
- 7.Swarztrauber, P. N., "FFT Algorithms for Vector Computers", Parallel Computing, 1 (1984), p. 45- 63.Google ScholarDigital Library
- 8.Swarztrauber~ P. N., "Multiprocessor FFTs", Parallel Computing, vol. 5 (1987), p. 197- 210.Google ScholarCross Ref
Index Terms
- FFTs in external of hierarchical memory
Recommendations
On non-blocking collectives in 3D FFTs
ScalA '11: Proceedings of the second workshop on Scalable algorithms for large-scale systemsWith the inclusion of non-blocking global collective operations in the MPI 3.0 draft specification many fundamental algorithms such as those for performing 3-dimensional (3D) FFTs will be modified to take advantage of non-blocking collectives. Novel ...
Comments