skip to main content
10.1145/968280.968304acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
Article

A quantitative analysis of the speedup factors of FPGAs over processors

Authors Info & Claims
Published:22 February 2004Publication History

ABSTRACT

The speedup over a microprocessor that can be achieved by implementing some programs on an FPGA has been extensively reported. This paper presents an analysis, both quantitative and qualitative, at the architecture level of the components of this speedup. Obviously, the spatial parallelism that can be exploited on the FPGA is a big component. By itself, however, it does not account for the whole speedup.In this paper we experimentally analyze the remaining components of the speedup. We compare the performance of image processing application programs executing in hardware on a Xilinx Virtex E2000 FPGA to that on three general-purpose processor platforms: MIPS, Pentium III and VLIW. The question we set out to answer is what is the inherent advantage of a hardware implementation over a von Neumann platform. On the one hand, the clock frequency of general-purpose processors is about 20 times that of typical FPGA implementations. On the other hand, the iteration level parallelism on the FPGA is one to two orders of magnitude that on the CPUs. In addition to these two factors, we identify the efficiency advantage of FPGAs as an important factor and show that it ranges from 6 to 47 on our test benchmarks. We also identify some of the components of this factor: the streaming of data from memory, the overlap of control and data flow and the elimination of some instruction on the FPGA. The results provide a deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC. They also help understand the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies.

References

  1. J. Villarreal, D. Suresh, G. Stitt, F. Vahid and W. Najjar. Improving Software Performance with Configurable Logic, Kluwer Journal on Design Automation of Embedded Systems, November 2002, Volume 7, Issue 4, pp.325--339.Google ScholarGoogle Scholar
  2. Y. Li and W. Chu. A New Non-Restoring Square Root Algorithm and Its VLSI Implementations. ICCD'96, International Conference on Computer Design, Austin, Texas, October 7 - 9, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Frigo, M. Gokhale and D. Lavenier. Evaluation of the Streams-C C-to-FPGA Compiler: An Applications Perspective. 9th ACM International Symposium on Field-Programmable Gate Arrays, Monterey, California, February 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. http://www.synplicity.com/Google ScholarGoogle Scholar
  5. http://www.xilinx.com/Google ScholarGoogle Scholar
  6. http://www.simplescalar.com/Google ScholarGoogle Scholar
  7. http://www.intel.com/software/products/vtune/Google ScholarGoogle Scholar
  8. Annapolis Microsystems Inc. WILDSTAR hardware Reference Manual. (http://www.annapmicro.com)Google ScholarGoogle Scholar
  9. W. Böhm, R. Beveridge, B. Draper, C. Ross, M. Chawathe, and W. Najjar. Compiling ATR probing codes for execution on FPGA hardware. IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, California, April 21-24, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. DeHon, The Density Advantage of Configurable Computing, Computer, vol.33.No.4, April 2000, IEEE Computer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Moll and M. Shand, Systems performance measurement on PCI Pamette, In FPGAs for Custom Computing Machines (FCCM'97), April 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Triscend Corporation: http://www.triscend.com/Google ScholarGoogle Scholar
  13. Xilinx, Inc. http://www.xilinx.com/Google ScholarGoogle Scholar
  14. Altera Corporation. http://www.altera.com/Google ScholarGoogle Scholar
  15. Berkeley Design Technology, Inc. (BDTI): http://www.bdti.com/Google ScholarGoogle Scholar
  16. G. Stitt, R. Lysecky and F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC'03), Anaheim, California, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hauser, J. Wawrzynek. Garp: a MIPS processor with a reconfigurable coprocessor. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'97), pages 12--21, Napa Valley, California, April 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Brebner. Single-Chip Gigabit Mixed-Version IP Router on Virtex-II Pro, 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'02), Napa, California, September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Cardells-Tormo, J. Valls-Coquillat, V. Almenar-Terre, and V. Torres-Carot. Efficient FPGA-based QPSK Demodulation Loops: Application to the DVB Standard, 12th International Conference on Field Programmable Logic and Applications (FPL'02), Montpellier, France, September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A quantitative analysis of the speedup factors of FPGAs over processors

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            FPGA '04: Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
            February 2004
            266 pages
            ISBN:1581138296
            DOI:10.1145/968280

            Copyright © 2004 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 February 2004

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate125of627submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader