skip to main content
10.1145/3174243.3174263acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations

Published:15 February 2018Publication History

ABSTRACT

We propose a novel FFT datapath that reduces the memory requirement compared to state-of-the-art RAM-based implementations by up to a factor of two. The novelty is in a technique to fuse the datapaths for the required perfect shuffle and bit reversal and is applicable to an entire design space of FFT implementations with varying degrees of reuse and number of input ports. We implemented a tool to generate this FFT design space for a given input size and to benchmark against prior work. The results show a reduction of half the RAM banks and/or half the logic complexity used for the permutations. The technique for fusing permutations is more generally applicable beyond the FFT.

References

  1. D. Cohen, “Simplified control of FFT hardware,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 24, no. 6, pp. 577--579, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. Kumhom, J. R. Johnson, and P. Nagvajara, “Design, optimization, and implementation of a universal FFT processor,” in Proc. International ASIC/SOC Conference (ASIC), pp. 182--186, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Cortés, I. Vélez, and J. F. Sevillano, “Radix $r^k $FFTs: Matricial representation and SDC/SDF pipeline implementation,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2824--2839, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. He and M. Torkelson, “A new approach to pipeline FFT processor,” in Proc. Parallel Processing Symposium (IPPS), pp. 766--770, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Akin, F. Franchetti, and J. C. Hoe, “FFTs with near-optimal memory access through block data layouts: Algorithm, architecture and design automation,” Journal of Signal Processing Systems, vol. 85, no. 1, pp. 67--82, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. C. Pease, “An adaptation of the fast Fourier transform for parallel processing,” Journal of the ACM, vol. 15, no. 2, pp. 252--264, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. H. Takala, T. S. Jarvinen, and H. T. Sorokin, “Conflict-free parallel memory access scheme for FFT processors,” in Proc. International Symposium on Circuits and Systems (ISCAS), vol. 4, pp. 524--527, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  8. K. J. Page, J. F. Arrigo, and P. M. Chau, “Reconfigurable-hardware-based digital signal processing for wireless communications,” in Advanced Signal Processing Algorithms, Architectures and Implementations (P. SPIE, ed.), vol. 3162, pp. 529--540, 1997.Google ScholarGoogle Scholar
  9. G. Nordin, P. A. Milder, J. C. Hoe, and M. Püschel, “Automatic generation of customized discrete Fourier transform IPs,” in Proc. Design Automation Conference (DAC), pp. 471--474, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, “Computer generation of hardware for linear digital signal processing transforms,” Transactions on Design Automation of Electronic Systems (TODAES), vol. 17, no. 2, pp. 15:1--15:33, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. A. Milder, J. C. Hoe, and M. Püschel, “Automatic generation of streaming datapaths for arbitrary fixed permutations,” in Proc. Design, Automation and Test in Europe (DATE), pp. 1118--1123, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Koehn and P. Athanas, “Arbitrary streaming permutations with minimum memory and latency,” in Proc. International Conference on Computer-Aided Design (ICCAD), pp. 1--6, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. K. Parhi, “Systematic synthesis of DSP data format converters using life-time analysis and forward-backward register allocation,” IEEE Transactions on Circuits and Systems II (TCAS-II), vol. 39, no. 7, pp. 423--440, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  14. K. J. Page and P. M. Chau, “Folding large regular computational graphs onto smaller processor arrays,” in Advanced Signal Processing Algorithms, Architectures and Implementations (P. SPIE, ed.), vol. 2846, pp. 383--394, 1996.Google ScholarGoogle Scholar
  15. T. J"arvinen, P. Salmela, H. Sorokin, and J. Takala, “Stride permutation networks for array processors,” in Proc. International Conference on Application-Specific Systems, Architectures and Processors Proceedings (ASAP), pp. 376--386, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Püschel, P. A. Milder, and J. C. Hoe, “Permuting streaming data using RAMs,” Journal of the ACM, vol. 56, no. 2, pp. 10:1--10:34, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Serre, T. Holenstein, and M. Püschel, “Optimal circuits for streamed linear permutations using RAM,” in Proc. International Symposium on Field-Programmable Gate Arrays (FPGA), pp. 215--223, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Serre, “DFT and streamed linear permutation generator for hardware.” https://acl.inf.ethz.ch/research/hardware/, 2018.Google ScholarGoogle Scholar
  19. M. C. Pease, “The indirect binary n-cube microprocessor array,” IEEE Transactions on Computers, vol. 26, no. 5, pp. 458--473, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Lenfant and S. Tahé, “Permuting data with the Omega network,” Acta Informatica, vol. 21, no. 6, pp. 629--641, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  21. F. Serre and M. P" uschel, “Generalizing block LU factorization: A lower-upper-lower block triangular decomposition with minimal off-diagonal ranks,” Linear Algebra and its Applications, vol. 509, pp. 114--142, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  22. F. de Dinechin and B. Pasca, "Designing custom arithmetic data paths with FloPoCo," IEEE Design & Test of Computers, vol. 28, no. 4, pp. 18--27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. A. Milder, “Spiral DFT/FFT IP core generator.” http://www.spiral.net/hardware/dftgen.html, 2008.Google ScholarGoogle Scholar
  24. A. Waksman, “A permutation network,” Journal of the ACM, vol. 15, no. 1, pp. 159--163, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. G. Jo and M. H. Sunwoo, “New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy,” IEEE Transactions on Circuits and Systems I (TCAS-I), vol. 52, no. 5, pp. 911--919, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Garrido, M. Ángel Sánchez, M. L. López-Vallejo, and J. Grajal, “A 4096-point radix-4 memory-based FFT using DSP slices,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 1, pp. 375--379, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
            February 2018
            310 pages
            ISBN:9781450356145
            DOI:10.1145/3174243

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 February 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            FPGA '18 Paper Acceptance Rate10of116submissions,9%Overall Acceptance Rate125of627submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader