research-article

Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations

Authors:
François Serre

ETH Zurich, Zürich, Switzerland

ETH Zurich, Zürich, Switzerland
View Profile

,
Markus Püschel

ETH Zurich, Zürich, Switzerland

ETH Zurich, Zürich, Switzerland
View Profile

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2018Pages 219–228https://doi.org/10.1145/3174243.3174263

Published:15 February 2018Publication History

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 219–228

ABSTRACT

We propose a novel FFT datapath that reduces the memory requirement compared to state-of-the-art RAM-based implementations by up to a factor of two. The novelty is in a technique to fuse the datapaths for the required perfect shuffle and bit reversal and is applicable to an entire design space of FFT implementations with varying degrees of reuse and number of input ports. We implemented a tool to generate this FFT design space for a given input size and to benchmark against prior work. The results show a reduction of half the RAM banks and/or half the logic complexity used for the permutations. The technique for fusing permutations is more generally applicable beyond the FFT.

References

D. Cohen, “Simplified control of FFT hardware,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 6, pp. 577--579, 1976.Google ScholarCross Ref
P. Kumhom, J. R. Johnson, and P. Nagvajara, “Design, optimization, and implementation of a universal FFT processor,” in Proc. International ASIC/SOC Conference (ASIC), pp. 182--186, 2000.Google ScholarCross Ref
A. Cortés, I. Vélez, and J. F. Sevillano, “Radix $r^k $FFTs: Matricial representation and SDC/SDF pipeline implementation,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2824--2839, 2009. Google ScholarDigital Library
S. He and M. Torkelson, “A new approach to pipeline FFT processor,” in Proc. Parallel Processing Symposium (IPPS), pp. 766--770, 1996. Google ScholarDigital Library
B. Akin, F. Franchetti, and J. C. Hoe, “FFTs with near-optimal memory access through block data layouts: Algorithm, architecture and design automation,” Journal of Signal Processing Systems, vol. 85, no. 1, pp. 67--82, 2015. Google ScholarDigital Library
M. C. Pease, “An adaptation of the fast Fourier transform for parallel processing,” Journal of the ACM, vol. 15, no. 2, pp. 252--264, 1968. Google ScholarDigital Library
J. H. Takala, T. S. Jarvinen, and H. T. Sorokin, “Conflict-free parallel memory access scheme for FFT processors,” in Proc. International Symposium on Circuits and Systems (ISCAS), vol. 4, pp. 524--527, 2003.Google ScholarCross Ref
K. J. Page, J. F. Arrigo, and P. M. Chau, “Reconfigurable-hardware-based digital signal processing for wireless communications,” in Advanced Signal Processing Algorithms, Architectures and Implementations (P. SPIE, ed.), vol. 3162, pp. 529--540, 1997.Google Scholar
G. Nordin, P. A. Milder, J. C. Hoe, and M. Püschel, “Automatic generation of customized discrete Fourier transform IPs,” in Proc. Design Automation Conference (DAC), pp. 471--474, 2005. Google ScholarDigital Library
P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, “Computer generation of hardware for linear digital signal processing transforms,” Transactions on Design Automation of Electronic Systems (TODAES), vol. 17, no. 2, pp. 15:1--15:33, 2012. Google ScholarDigital Library
P. A. Milder, J. C. Hoe, and M. Püschel, “Automatic generation of streaming datapaths for arbitrary fixed permutations,” in Proc. Design, Automation and Test in Europe (DATE), pp. 1118--1123, 2009. Google ScholarDigital Library
T. Koehn and P. Athanas, “Arbitrary streaming permutations with minimum memory and latency,” in Proc. International Conference on Computer-Aided Design (ICCAD), pp. 1--6, 2016. Google ScholarDigital Library
K. K. Parhi, “Systematic synthesis of DSP data format converters using life-time analysis and forward-backward register allocation,” IEEE Transactions on Circuits and Systems II (TCAS-II), vol. 39, no. 7, pp. 423--440, 1992.Google ScholarCross Ref
K. J. Page and P. M. Chau, “Folding large regular computational graphs onto smaller processor arrays,” in Advanced Signal Processing Algorithms, Architectures and Implementations (P. SPIE, ed.), vol. 2846, pp. 383--394, 1996.Google Scholar
T. J"arvinen, P. Salmela, H. Sorokin, and J. Takala, “Stride permutation networks for array processors,” in Proc. International Conference on Application-Specific Systems, Architectures and Processors Proceedings (ASAP), pp. 376--386, 2004. Google ScholarDigital Library
M. Püschel, P. A. Milder, and J. C. Hoe, “Permuting streaming data using RAMs,” Journal of the ACM, vol. 56, no. 2, pp. 10:1--10:34, 2009. Google ScholarDigital Library
F. Serre, T. Holenstein, and M. Püschel, “Optimal circuits for streamed linear permutations using RAM,” in Proc. International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 215--223, 2016. Google ScholarDigital Library
F. Serre, “DFT and streamed linear permutation generator for hardware.” https://acl.inf.ethz.ch/research/hardware/, 2018.Google Scholar
M. C. Pease, “The indirect binary n-cube microprocessor array,” IEEE Transactions on Computers, vol. 26, no. 5, pp. 458--473, 1977. Google ScholarDigital Library
J. Lenfant and S. Tahé, “Permuting data with the Omega network,” Acta Informatica, vol. 21, no. 6, pp. 629--641, 1985.Google ScholarCross Ref
F. Serre and M. P" uschel, “Generalizing block LU factorization: A lower-upper-lower block triangular decomposition with minimal off-diagonal ranks,” Linear Algebra and its Applications, vol. 509, pp. 114--142, 2016.Google ScholarCross Ref
F. de Dinechin and B. Pasca, "Designing custom arithmetic data paths with FloPoCo," IEEE Design & Test of Computers, vol. 28, no. 4, pp. 18--27, 2011. Google ScholarDigital Library
P. A. Milder, “Spiral DFT/FFT IP core generator.” http://www.spiral.net/hardware/dftgen.html, 2008.Google Scholar
A. Waksman, “A permutation network,” Journal of the ACM, vol. 15, no. 1, pp. 159--163, 1968. Google ScholarDigital Library
B. G. Jo and M. H. Sunwoo, “New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy,” IEEE Transactions on Circuits and Systems I (TCAS-I), vol. 52, no. 5, pp. 911--919, 2005.Google ScholarCross Ref
M. Garrido, M. Ángel Sánchez, M. L. López-Vallejo, and J. Grajal, “A 4096-point radix-4 memory-based FFT using DSP slices,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 1, pp. 375--379, 2017. Google ScholarDigital Library

Index Terms

Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations
1. Hardware
2. Theory of computation
  1. Computational complexity and cryptography
    1. Circuit complexity

Recommendations

Optimal Circuits for Streamed Linear Permutations Using RAM
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

We propose a method to automatically derive hardware structures that perform a fixed linear permutation on streaming data. Linear permutations are permutations that map linearly the bit representation of the elements addresses. This set contains many of ...
Read More
Permuting streaming data using RAMs

This article presents a method for constructing hardware structures that perform a fixed permutation on streaming data. The method applies to permutations that can be represented as linear mappings on the bit-level representation of the data locations. ...
Read More
Algorithm 338: algol procedures for the fast Fourier transform

The following procedures are based on the Cooley-Tukey algorithm [1] for computing the finite Fourier transform of a complex data vector; the dimension of the data vector is assumed here to be a power of two. Procedure COMPLEXTRANSFORM computes either ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2018
310 pages
ISBN:9781450356145
DOI:10.1145/3174243
General Chair:
Jason H. Anderson
University of Toronto, Canada
,
Program Chair:
Kia Bazargan
University of Minnesota, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 February 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bit-reversal
connection network
data reordering
fast fourier transform
linear permutation
streaming datapath
stride permutation
Qualifiers
- research-article
Conference

Acceptance Rates
FPGA '18 Paper Acceptance Rate10of116submissions,9%Overall Acceptance Rate125of627submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 351
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimal Circuits for Streamed Linear Permutations Using RAM

Permuting streaming data using RAMs

Algorithm 338: algol procedures for the fast Fourier transform

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimal Circuits for Streamed Linear Permutations Using RAM

Permuting streaming data using RAMs

Algorithm 338: algol procedures for the fast Fourier transform

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media