skip to main content
10.1145/2790282.2790292acmconferencesArticle/Chapter ViewAbstractPublication PagesissacConference Proceedingsconference-collections
research-article

High performance implementation of the inverse TFT

Published:10 July 2015Publication History

ABSTRACT

The inverse truncated Fourier transform (ITFT) is a key component in the fast polynomial and large integer algorithms introduced by van der Hoeven. This paper reports a high performance implementation of the ITFT which poses additional challenges compared to that of the forward transform. A general-radix variant of the ITFT algorithm is developed to allow the implementation to automatically adapt to the memory hierarchy. Then a parallel ITFT algorithm is developed that trades off small arithmetic cost for full vectorization and improved multi-threaded parallelism. The algorithms are automatically generated and tuned to produce an arbitrary-size ITFT library. The new algorithms and the implementation smooths out the staircase performance associated with power-of-two modular FFT implementations, and provide significant performance improvement over zero-padding approaches even when high-performance FFT libraries are used.

References

  1. A. Arnold. A new truncated fourier transform algorithm. In Proceedings of the 38th International Symposium on International Symposium on Symbolic and Algebraic Computation, ISSAC '13, pages 15--22, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. H. Bailey. FFTs in external or hierarchical memory. J. Supercomput., 4(1):23--35, Mar. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. I. Bluestein. A linear filtering approach to the computation of discrete Fourier transform. Audio and Electroacoustics, IEEE Transactions on, 18(4):451--455, 1970.Google ScholarGoogle Scholar
  4. J. Cooley and J. Tukey. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19(90):297--301, 1965.Google ScholarGoogle ScholarCross RefCross Ref
  5. F. Franchetti and M. Püschel. Generating high-performance pruned FFT implementations. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 549--552, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In Programming Languages Design and Implementation (PLDI), pages 315--326, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Franchetti, Y. Voronenko, and M. Püschel. FFT program generation for shared memory: SMP and multicore. In Supercomputing (SC), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Frigo and S. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. V. Z. Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, New York, NY, USA, 2 edition, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. J. Good. The interaction algorithm and practical Fourier analysis. Journal of the Royal Statistical Society. Series B (Methodological), 20(2): pp. 361--372, 1958.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. Harvey. A cache-friendly truncated FFT. Theor. Comput. Sci., 410(27-29):2649--2658, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Harvey and D. S. Roche. An in-place truncated fourier transform and applications to polynomial multiplication. In Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, ISSAC '10, pages 325--329, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Mateer. Fast Fourier transform algorithms with applications. PhD thesis, Citeseer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Li, M. Moreno Maza, R. Rasheed, and E. Schost. The modpn library: Bringing fast polynomial arithmetic into Maple. J. Symb. Comput., 46(7):841--858, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. M. Maza and Y. Xie. FFT-based dense polynomial arithmetic on multi-cores. In HPCS, pages 378--399, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Meng and J. Johnson. Automatic parallel library generation for general-size modular FFT algorithms. In V. Gerdt, W. Koepf, E. Mayr, and E. Vorozhtsov, editors, Computer Algebra in Scientific Computing, volume 8136 of Lecture Notes in Computer Science, pages 243--256. Springer International Publishing, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Meng and J. Johnson. High performance implementation of the TFT. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 328--334, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Meng, Y. Voronenko, J. R. Johnson, M. Moreno Maza, F. Franchetti, and Y. Xie. Spiral-generated modular FFT algorithms. In Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, PASCO '10, pages 169--170, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. L. Montgomery. Modular multiplication without trial division. Mathematics of Computation, 44(170):519--521, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. C. Rader. Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE, 56(6):1107--1108, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proc. ISSAC 2004, pages 290--296, Univ. of Cantabria, Santander, Spain, July 4--7 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Voronenko. Library generation for linear transforms. PhD thesis, Electrical and Computer Engineering, Carnegie Mellon University, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Xiong, J. Johnson, R. W. Johnson, and D. Padua. SPL: A language and compiler for DSP algorithms. In Programming Languages Design and Implementation (PLDI), pages 298--308, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. High performance implementation of the inverse TFT

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PASCO '15: Proceedings of the 2015 International Workshop on Parallel Symbolic Computation
            July 2015
            256 pages
            ISBN:9781450335997
            DOI:10.1145/2790282

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 July 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader