ABSTRACT
The speedup over a microprocessor that can be achieved by implementing some programs on an FPGA has been extensively reported. This paper presents an analysis, both quantitative and qualitative, at the architecture level of the components of this speedup. Obviously, the spatial parallelism that can be exploited on the FPGA is a big component. By itself, however, it does not account for the whole speedup.In this paper we experimentally analyze the remaining components of the speedup. We compare the performance of image processing application programs executing in hardware on a Xilinx Virtex E2000 FPGA to that on three general-purpose processor platforms: MIPS, Pentium III and VLIW. The question we set out to answer is what is the inherent advantage of a hardware implementation over a von Neumann platform. On the one hand, the clock frequency of general-purpose processors is about 20 times that of typical FPGA implementations. On the other hand, the iteration level parallelism on the FPGA is one to two orders of magnitude that on the CPUs. In addition to these two factors, we identify the efficiency advantage of FPGAs as an important factor and show that it ranges from 6 to 47 on our test benchmarks. We also identify some of the components of this factor: the streaming of data from memory, the overlap of control and data flow and the elimination of some instruction on the FPGA. The results provide a deeper understanding of the tradeoff between system complexity and performance when designing Configurable SoC as well as designing software for CSoC. They also help understand the one to two orders of magnitude in speedup of FPGAs over CPU after accounting for clock frequencies.
- J. Villarreal, D. Suresh, G. Stitt, F. Vahid and W. Najjar. Improving Software Performance with Configurable Logic, Kluwer Journal on Design Automation of Embedded Systems, November 2002, Volume 7, Issue 4, pp.325--339.Google Scholar
- Y. Li and W. Chu. A New Non-Restoring Square Root Algorithm and Its VLSI Implementations. ICCD'96, International Conference on Computer Design, Austin, Texas, October 7 - 9, 1996. Google ScholarDigital Library
- J. Frigo, M. Gokhale and D. Lavenier. Evaluation of the Streams-C C-to-FPGA Compiler: An Applications Perspective. 9th ACM International Symposium on Field-Programmable Gate Arrays, Monterey, California, February 2001. Google ScholarDigital Library
- http://www.synplicity.com/Google Scholar
- http://www.xilinx.com/Google Scholar
- http://www.simplescalar.com/Google Scholar
- http://www.intel.com/software/products/vtune/Google Scholar
- Annapolis Microsystems Inc. WILDSTAR hardware Reference Manual. (http://www.annapmicro.com)Google Scholar
- W. Böhm, R. Beveridge, B. Draper, C. Ross, M. Chawathe, and W. Najjar. Compiling ATR probing codes for execution on FPGA hardware. IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, California, April 21-24, 2002. Google ScholarDigital Library
- A. DeHon, The Density Advantage of Configurable Computing, Computer, vol.33.No.4, April 2000, IEEE Computer. Google ScholarDigital Library
- L. Moll and M. Shand, Systems performance measurement on PCI Pamette, In FPGAs for Custom Computing Machines (FCCM'97), April 1997. Google ScholarDigital Library
- Triscend Corporation: http://www.triscend.com/Google Scholar
- Xilinx, Inc. http://www.xilinx.com/Google Scholar
- Altera Corporation. http://www.altera.com/Google Scholar
- Berkeley Design Technology, Inc. (BDTI): http://www.bdti.com/Google Scholar
- G. Stitt, R. Lysecky and F. Vahid. Dynamic Hardware/Software Partitioning: A First Approach. Design Automation Conference (DAC'03), Anaheim, California, June 2003. Google ScholarDigital Library
- J. Hauser, J. Wawrzynek. Garp: a MIPS processor with a reconfigurable coprocessor. IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'97), pages 12--21, Napa Valley, California, April 1997. Google ScholarDigital Library
- G. Brebner. Single-Chip Gigabit Mixed-Version IP Router on Virtex-II Pro, 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'02), Napa, California, September 2002. Google ScholarDigital Library
- F. Cardells-Tormo, J. Valls-Coquillat, V. Almenar-Terre, and V. Torres-Carot. Efficient FPGA-based QPSK Demodulation Loops: Application to the DVB Standard, 12th International Conference on Field Programmable Logic and Applications (FPL'02), Montpellier, France, September 2002. Google ScholarDigital Library
Index Terms
A quantitative analysis of the speedup factors of FPGAs over processors
Recommendations
The Molen compiler for reconfigurable processors
In this paper, we describe the compiler developed to target the Molen reconfigurable processor and programming paradigm. The compiler automatically generates optimized binary code for C applications, based on pragma annotation of the code executed on ...
Hardware accelerated FPGA placement
A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...
DBHI: A Tool for Decoupled Functional Hardware-Software Co-Design on SoCs
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysThis paper presents a system-level co-simulation and co-verification workflow to ease the transition from a software-only procedure, executed in a General Purpose processor, to the integration of a custom hardware accelerator developed in a Hardware ...
Comments